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Smarter technology for a Smarter Planet: 


Finding meaning 
in the noise. 


An unprecedented amount of information flows through companies every 
day. But to what effect? A recent study found that 52% of managers have 
no confidence in the information they rely on to do their job. And 42% of 
them actually use the wrong information at least once a week. Without 
the right approach to business intelligence, companies struggle to turn all 
that information into sound decisions. 


IBM business intelligence and performance management solutions give 
you the smarter tools you need to access the right information, making 
it available to the right people when and how they need it. Today IBM 
is helping over 20,000 companies spot trends, mitigate risk and make 
better decisions, faster. In fact, we helped a major retail supplier achieve 
this by cutting their average financial reporting time by almost 50%. 


A smarter business needs smarter software, systems and services. 
Let's build a smarter planet. iom.com/intelligence 


IBM, the IBM logo, ibm.com, Smarter Planet and the planet icon are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other 
product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. 
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t “Space and time are not conditions in which we live; * 
| : they are simply modes in which we think." 
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- Albert Einstein 


"Don't | know it!" 


- Denny Cherry, 
SOL Server MVP, DBA , 
and LiteSpeed® with Fast Compression user 
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Save Space, Time and Money 


Its Not Quantum Physics. It’s LiteSpeed® for SOL Server's 40:1 
Fast Compression Technology. 


“Using LiteSpeed's new Fast Compression technology is like Quest giving me extra storage 
space and time,’ said Denny Cherry. “Not only does it store 10 times the number of backups on 
disk, but the process is done in a fraction of the time it took me using traditional compression. 
Now, my recovery operations are fast and easy! It also does the ‘heavy lifting’ for me by 
examining every database every night, to make the best backup decisions based on change 
rates, day of the week and my retention needs. So, I'm no longer wasting my time manually 
checking everything. 


BE A SOL SERVER ROCK STAR! 


“Bottom line? | don't have to worry about storage space or backup and recovery time. It's the 
coolest gift Quest could give someone in charge of SOL Server backups.’ 


LiteSpeed’ with Fast Compression vs. Other Tools 
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Learn more. Read "How to Achieve 40:1 Backup Compression with 
LiteSpeed” for SQL Server's Fast Compression” at www.quest.com/40to1 
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Calculating Concurrent 
Sessions, Part 2 


—ltzik Ben-Gan 

Itzik presents a set-based solution with linear complexity 
that performs better than the previous solutions. He also 
discusses a set-based solution that will outperform all 
other solutions once SQL Server supports it. 


Administer WSS 3.0 Content 
Databases in SQL Server 


—Wendy Henry 

Find out how to use specific T-SQL queries to pull useful 
information out of tables in SharePoint’s content 
databases. 


Passing NULL Parameters 
—William Vaughn 

Have you ever had a stored procedure that could accept 
a NULL value as a parameter but didn’t seem to work 
correctly when you did so? Here’s what might have gone 
wrong. 
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Design Your Data Warehouse 
Storage for Performance 
—Michelle A. Poolet 

Follow these steps to ensure maximum performance from 
your storage subsystem. 


Building Your First Cube 

—Derek Comingore 

Jump into advanced BI by using this step-by-step guide to 
make your first cube. 


47 DBCC CHECKDB for Very 


Large Databases 
—David Paul Giroux 
Running DBCC can be problematic on VLDBs. Use the 
author’s Admin/Worker Job approach to ease the process. 


Smarter technology for a Smarter Planet: 


Is your information 
withholding information? 


Most businesses have a data management strategy. And another 

data management strategy. And another data management strategy. 
One for every application: ERP, CRM, SCM, HRM, etc. The result is a 
proliferation of siloed data and disjointed information that gets in the 


way of smart decisions. 
An Information Agenda from IBM moves you from an application- 
centric approach to your information toward a broader, more holistic 
view of all of your information systems. So you can make use of your 
data to make decisions faster and with greater confidence. This insight 
can help you optimize your processes, predict market changes and 
turn your information into a strategic asset. Banks can better manage 
financial risk. Retail companies can crystallize trends. Manufacturing 
companies can speed delivery across a complex supply chain. It’s a 
way to make information work for you, instead of vice versa. 


A smarter business needs smarter software, systems and services. 
Let’s build a smarter planet. ibm.com/infoagenda 
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Smart Systems Management 
SQL Server Magazine 
Editors’ Best Awards 


Company: Quest Software 

Category: Best Backup and Recovery 

Product: Quest LiteSpeed for SQL Server 5.1.1 
Award: Gold 


Company: Quest Software 
Category: Best Development Tool 
Product: Toad for SQL Server 
Award: Gold 


SQL Server Magazine 
Community Choice Awards 


fee) Company: Quest Software 

aes) Category: Best Database Monitoring and Performance Product 
Product: Foglight Performance Analysis for SQL Server 

Award: Silver 


Company: Quest Software 

Category: Best SharePoint Product 

Product: Site Administrator for SharePoint 
Award: Gold 


Learn more about Quest Software here: www.quest.com/SQLServerAward + 800.306.9329 
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SQL Server PowerShell Extensions 


—Kevin Kline 
Automate even the most difficult SQL Server administrative tasks 
using this free PowerShell script set. 
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Is SQL Server 2008 R2 for You? 
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Editor’s 
Tip 


Here are three reasons 

to appreciate how truly 
remarkable the SQL Server 
community is. Also, check 
out this overview of business 
intelligence, its capabilities, 
the barriers of corporate 
culture, and how to help 
your organization become an 
intelligent enterprise. 


Are you following 

SQL Mag on Twitter 

yet? Check out www.twitter.com/ 

SQLServerMag. We want to follow 

you, too. Tell us what you're doing! 
— Megan Keller, associate editor 


It’s been a blast celebrating 10 years of SQL Mag with you. We're 
looking forward to the next 10 years of sharing quality content 
with the SQL Server community—in print and online. This is 
YOUR magazine and online resource: Tell us what you want more 


of—DBA, developer, or Bl content. Send your thoughts and ar- 
ticle ideas to mkeller@sqlmag.com and smolnar@sqlmag.com. 
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Build Client-Side User Interfaces with 
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Applications DAN WAHLIN 
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Is SQL Server 2008 R2 


For You? 


W ith SQL Server 2008 R2 just around the cor- 
ner how do you decide if this upgrade is for 
you? This could depend on the version of SQL Serv- 
er you're currently running. Or it could depend on 
the needs of your organization and the pain points 
you're currently feeling. In tight economic times 
many businesses have stuck with the tried and true 
solutions of SQL Server 2005 and even SQL Server 
2000. As they say, “If it ain't broke . . ..” However, 
your business could get left 
behind if it stays with an old 
release too long—IT skills 
become dated, and the team 
can lose touch with current 
technology. Even worse, your 
organization misses out on the 
benefits that new features can 
offer. This could mean that 
ongoing business problems go 
unresolved. Or that new proj- 
ects and user requests never 
get off the ground. If your company can’t offer the 
same services as your competitors, it could put your 
business at a competitive disadvantage. 


The Benefits of SQL Server 2008 R2 
SQL Server 2008 R2 builds on the SQL Server 2008 
release and rolls up all the SQL Server 2008 service 
packs. In addition, SQL Server 2008 R2 offers a 
host of new functionality—most of it focused on 
providing managed self-service business intelligence 
(BI). This technology, called PowerPivot, empow- 
ers end users, but keeps control of the BI process 
in the hands of the IT group. To learn more, check 
out SOL Server Magazine’s October 2009 interview 
with Donald Farmer at InstantDoc ID 102613. 
Other big features slated for the SQL Server 2008 
R2 release include support for up to 256 processors, 
slipstream installs, new Master Data Services, en- 
hanced multi-server management, and hot-standby 
database mirroring. 


Upgrading from 2008: 

An Easy Decision 

If you're currently running SQL Server 2008, then 
the decision to upgrade to SQL Server 2008 R2 


SQL Server Magazine * www.sqlmag.com 


If your business is still 
running SQL Server 
2000 then you can gain 
an incredible amount 
of new functionality 

by upgrading to SQL 
Server 2008 R2. 


primarily depends on the value that the new man- 
aged self-service BI features bring to your organiza- 
tion, and to a lesser degree it depends on how much 
multi-server management pain your organization is 
feeling. 


Upgrading from 2005: 

More Functionality 

If you're running SQL Server 2005, then there's 
a lot of new functionality to 
be gained by moving to SQL 
Server 2008 R2. SQL Server 
2008 R2 includes the new BI 
functionality as well as the 
whole set of SQL Server 2008 
features including database 
backup compression; trans- 
parent database encryption; 
new date, time, and spatial 
data types; new filestream data 
type; as well as the Resource 
Governor and policy-based management. 


Upgrading from 2000: Get Back in 
the Game 

If your business is still running SQL Server 2000 
then you can gain an incredible amount of new 
functionality by upgrading to SQL Server 2008 
R2. Many of the subsystems users of later versions 
may take for granted, such as SQL Server Integra- 
tion Services (SSIS) and SQL Service Broker, 
were introduced with SQL Server 2005, as 
was the SQL CLR. More importantly, 
even though a web update provided SQL 
Server Reporting Services (SSRS) for SQL 
Server 2000, out-of-the-box SQL Server 2000 
didn’t offer SSRS, and few businesses took advan- 
tage of the web downloads. 

Moving to SQL Server 2008 R2 gains SQL 
Server 2000 users the impressive feature set intro- 
duced in SQL Server 2005, plus the SQL Server 
2008 and 2008 R2 features. Besides, most SQL 
Server 2000 setups are running on dated hardware 
due for a refresh. When you update that hardware 
make the move to a new release. SOU 

InstantDoc ID 103000 


Michael Otey 


(motey@ sqlmag.com) is technical director 
for Windows IT Pro and SQL Server Maga- 
zine and author of Microsoft SQL Server 

2008 New Features (Osborne/McGraw-Hil). 
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SQL Server PowerShell Extensions 


Use this free tool to automate 


administrative tasks 


po" is a very powerful scripting lan- 
guage that can amplify your ability to au- 
tomate almost any administrative function. This 
month's free tool is a set of handy PowerShell 
scripts, SQL Server PowerShell Extensions 
(SQLPSX). SQLPSX, which was written by Chad 
Miller, a SQL Server DBA living in the Tampa, 
FL area, automates many common administrative 
functions in SQL Server. 
PowerShell scripts offer DBAs several advan- 
tages over the standard T-SQL and SQL Server In- 
tegration Services (SSIS) approach to automation. 
Some benefits of PowerShell include 
e Easy multiserver automation that lets you 
perform any given function across multiple SQL 
Server instances 

e Easier access to Windows resources, such as files, 
folders, Windows Services, and printers 

e Fast and easy data loads when you don’t need 
the sophistication of SSIS 

* Quick and easy retrieval of properties of objects 
and processes on the server 


You can learn more about PowerShell by visit- 
ing the offidal Windows PowerShell page at www 
-microsoft.com/windowsserver2003/technologies/ 
management/powershell/default.mspx or give Mill- 
er's blog post “The Value Proposition of PowerShell 
to DBAs” a read at chadwickmiller.spaces.live.com/ 
blog/cns!EA42395138308430!347.entry. Now let's 
take a look at what SQLPSX has to offer. 


Function Calls and Scripts in 
SQLPSX 

In a nutshell, SOLPSX contains PowerShell scripts 
to perform more than 100 administrative SQL 
Server tasks, although many of the function calls 
and scripts focus on security settings for logins, us- 
ers, roles, and permissions. SQLSPX is available on 
the CodePlex website, and I recommend reading 
the Readme.Txt file included with the project be- 
cause it contains a full description of each function 
available. The following are some of the functions 
you can use to automate typical SQL Server tasks: 
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e Get-SqlServer calls the Microsoft.SqlServer 
.Management.SMO.Server object and retrieves a 
list of all available SQL Server systems. 

e Get-SQLUser retrieves an SMO user object with 
added properties showing all objects owned by 
the user. 

e Get-SQLData retrieves a SQL Server result set. 

e Get-SQLDatabase retrieves the properties for 
one or more databases. 

e Get-SQLUser retrieves the information about 
one or more users, including all the objects 
owned by the user. 


In addition to function calls and scripts, 
SQLPSX provides a reporting element. Once 
you've installed SQLPSX PowerShell functions, 
you can create a database to store their output and 
then view that data using SQL Server Reporting 
Services reports and queries to analyze the secu- 
rity information via Business Intelligence Develop- 
ment Studio (BIDS) or Visual Studio. 


SQLPSX’s System Requirements 

You can download SQLPSX from www.codeplex 
.com/SQLPSX. This tool requires SQL Server 
2008, the Server Management Objects (SMO), and 
PowerShell. SMO is installed by default with SQL 
Server Management Studio (SSMS), so if you have 
the native tools for SQL Server 2005 or later, you're 
good to go. Once you've installed SMO or SSMS 
and PowerShell, you'll need to set PowerShell's ex- 
ecution policy to remotesigned. (The exact way to 
set this policy varies by OS.) You might also need 
to unblock the SQLPSX PowerShell scripts so 
that they can run without constraint. Refer to the 
SQLPSX documentation for detailed instructions 
on enabling PowerShell execution on your SQL 
Server systems. 

Miller keeps a DBA-centric blog with lots of 
PowerShell information at chadwickmiller.spaces 
.live.com/default.aspx. I encourage you to read this 
blog to get familiar with PowerShell. Em 
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Kevin Kline 


(kevin.kline@ quest.com) is the director of 

technology for SQL Server Solutions at Quest 
Software and a founding board member of 

the international PASS. He is the author of 

SQL in a Nutshell, 3rd edition (O'Reilly). 
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BENEFITS: SQLPSX 
provides you with a 
ready-made mix of 
PowerShell scripts to 
automate even the 
most difficult SQL 
Server administrative 
tasks, including data- 
base maintenance, 
provisioning, and 
authorization. 


SYSTEM 
REQUIREMENTS: 
PowerShell; Server 
Management Objects; 
SQL Server 2008 


HOW TO GET IT: 
You can download 
SQLPSX from www 
.codeplex.com/ 
SQLPSX. 
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Such was the case with our Editors’ Best and Community Choice awards this year. The former award 
program highlights products that SOL Server Magazine and Windows IT Pro editors and contributors 
believe are worthy of recognition, whereas the latter program turns that process over to you, our readers. 
Our Community Choice awards allowed readers to decide which products and services were chosen for ac- 
claim and recognition. Rather than presenting a predefined list of products and services that limited your selec- 
tion to choices our editorial team had already made, this year we decided to open up the process to everyone 
and let you determine the products and services that were worthy of inclusion in our final voting phase. We also 
encouraged DBAs, developers, and IT pros to submit comments about why they selected the products they did, 
so you'll see lots of insightful comments and real-world wisdom from your peers about their favorite products 
on the pages that follow. 

Unlike last year—when we treated both award programs as separate entities—we decided to merge the 
award programs this year. We’ve listed the top three Editors’ Best products in each category directly adjacent 
to our Community Choice winners. Sometimes our editors and readers agreed on what products and services 
were best in a given category, and sometimes they didn’t. Yet regardless of whether these winners were picked 
by editors or readers, one thing is certain: All these awards recognize products and services that are considered 
the best of the best in their respective categories. 

By presenting our Community Choice and Editors’ Best award picks next to each other this year, we're 
hoping we'll encourage some dialogue about the selections that were made. Do you agree with the choices our 
editors made? Or do the picks that our readers made carry more weight? Please let us know what you think by 
emailing us your comments, or by visiting our forums (www.sqlmag.com/forums) and writing a post or two. 
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We chose- 


and you chose!- 


from an impressive crowd of unique offerings 


Best Backup and Recovery Software Product 


Editors’ Best 

Gold: LiteSpeed for SQL Server 

Quest Software * www.quest.com 
Silver: SQL safe backup 

Idera * www.idera.com 

Bronze: SQL Backup Pro 

Red Gate Software * www.red-gate.com 


E “LiteSpeed sets the bar for other SQL Serv- 
er backup programs. LiteSpeed has always 
used compression to reduce your backup 
windows but the new release has also add- 
ed support for the new FILESTREAM 

data type, and a new SmartDiff allows you to specify 

conditions for full or differential backups.” 
—Michael Otey, technical director, Windows IT Pro 
and SQL Server Magazine 


Editors’ Best 


“Few competitive products can match the robust 
feature set in LiteSpeed for SQL Server. Quest Soft- 
ware’s new SmartDiff technology scrunches down 
database backups into even smaller file sizes and 
is one of the reasons why LiteSpeed gets my vote.” 

—Jeff James, Windows IT Pro 


Community Choice 

Gold: Hyperbac for SQL Server 
Hyperbac * www.hyperbac.com 
Silver: Net Vault:Backup 

BakBone * www.bakbone.com 
Bronze: Simpana 

CommVault * www.commvault.com 


Quotes from your community about Hyper- 
bac for SOL Server ... 

“Seamless integration with SQL Server!” 
“Refreshing in its simplicity.” 


Best Business Intelligence and Reporting Tool 


Editors’ Best 

Gold: Tableau 

Tableau Software e www.tableausoftware.com 

Silver: NovaView 

Panorama Software * www.panorama.com 

Bronze: Analyzer 

Strategy Companion * www.strategycompanion.com 


o “What I like about Tableau 5.0 is that SQL 
Server professionals don’t have to spend a 
lot of time training business users how to 
use it—anyone can quickly learn to use this 
product to create active dashboards and re- 
ports and analyze data, as long as they have access to 
the Internet. Tableau 5.0 also lets you tie in to multiple 
data sources and create interactive visualizations that 
help you better understand your data, and therefore 
help you make informed business decisions, faster.” 
—Megan Keller, associate editor, 
SQL Server Magazine 


Editors’ Best 


“Giving stakeholders the information they need in a 
format they can understand is invaluable, and Tableau 


SQL Server Magazine + www.sqlmag.com 


5.0 does that better than just about any other BI tool.” 
—Jeff James, Windows IT Pro 


“Filling the gaping hole left by Microsoft's absorption 
of ProClarity, Strategy Companion's Analyzer is the 
best solution to complete the Microsoft BI platform. 
For analysts, power users, 
and general report con- 
sumers, Analyzer sup- 
ports the full range of 
SQL Server Analysis 
Services features. It has 
a zero-footprint client in- 
terface, making it simple 
to deploy and manage, 
with delivery options for 
SharePoint, Excel, and 
IE. Analyzer offers a 
powerful and intuitive set 
of analysis tools and visualizations that let business 
users make more confident decisions.” 
—Douglas McDowell, contributing editor, 
SOL Server Magazine 
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Community Choice 

Gold: IT Analytics 

Symantec * www.symantec.com 

Silver: Crystal Reports 

Business Objects e www.businessobjects.com 
Bronze: XtraReports Suite 

DevExpress * www.devexpress.com 


| Quotes from your community about Syman- 
| tec's IT Analytics ... 

“Leverages all the data inside the Altiris 
| platform.” 

“Default cube schemas and reports, visual 
quality, ease of use, dynamic tables and graphs, ben- 
efits of SQL reporting services.” 


Best Database Management Product 


Editors’ Best 

Gold: SQL Toolbelt 

Red Gate Software * www.red-gate.com 
Silver: SQL Defrag Manager 2.5 

Idera * www.idera.com 

Bronze: DatabaseSpy 2009 

Altova * www.altova.com 


“The Red Gate SQL Toolbelt pulls togeth- 
er 13 SQL Server management and de- 
velopment tools you can buy individually 
for less than half the price. Getting good 
value for your tools investment makes a 
lot of sense in a down 
economy.” 
—Sheila Molnar, 
executive editor, 
SQL Server Magazine 


“SQL Defrag Manager 


ScriptLogic’s Security Explorer for SQL Server 
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2.5 lets you quickly ana- 
lyze indexes and defrag- 
ment them. With Defrag 
Manager you can auto- 
mate index defragmenta- 
tion by creating policies, 
which can be scheduled, 


in which you can specify thresholds for fragmentation 

levels and index scan densities. SQL Defrag Manager 
2.5 has great features and an easy-to-use GUI.” 

—Michael K. Campbell, contributing editor, 

SQL Server Magazine 


“If you're managing more than one flavor of data- 

base, you'll want to check out DatabaseSpy. This 

multipurpose toolset lets database administrators 

connect to not only SQL Server, but also Oracle, 
IBM DB2, Sybase, and MySQL.” 

—Sheila Molnar, executive editor, 

SOL Server Magazine 


Community Choice 

Gold: Security Explorer for SQL Server 
ScriptLogic * www.scriptlogic.com 
Silver: SQL Toolbelt 

Red Gate Software * www.red-gate.com 
Bronze: SQL diagnostic manager 

Idera * www.idera.com 


Quotes from your community about Script- 
Logic’s Security Explorer for SOL Server ... 
“The extremely granular management of 
| permission sets makes my job easier.” 

“A comprehensive security solution.” 


Ii Buzzwords 


SQL Server Magazine + www.sqlmag.com 
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Best Database Monitoring and Performance Product 


Editors’ Best 

Gold: Performance Advisor for Analysis Services 
SQL Sentry + www.sqlsentry.net 

Silver: Zero Impact SQL Monitor 

SQL Power Tools * www.sqlpower.co.uk 

Bronze: Ignite 

Confio Software * www.confio.com 


alerting for SQL Server 
Reporting Services and 
SQL Server Integration 
Services, so you can keep 
an eye on your entire 
BI platform from one 
console.” 
—Megan Keller, 
associate editor, 
SOL Server Magazine 


“Analysis Services’ presence is becoming 
stronger throughout enterprises as more 
companies are realizing and adopting BI- 
related solutions and platforms. Analysis 
Services professionals have had to rely on 
out-of-the-box APIs, counters, and traces to obtain 
Analysis Services monitoring capabilities. With the 
release of SQLSentry’s Performance Advisor for 


Community Choice 
Gold: Zero Impact 
Enterprise Monitor SQL Power Tools + 
www.sqlpower.com 


SQL Sentry's 
Performance Advisor 
for Analysis Services 


Analysis Services, this is no longer the case. The 

Performance Advisor for Analysis Services product 

makes for a great companion tool for the Analysis 
Services professional.” 

—Derek Comingore, contributor, 

Windows IT Pro 


“What I most appreciate about Performance Advisor 
for Analysis Services is that it lets you monitor SQL 
Server Analysis Services’ (SSAS’) resource utilization 
in real-time mode or historic mode, then easily trou- 
bleshoot bottlenecks from its intuitive Performance 
Dashboard for SSAS. It also offers monitoring and 


Silver: Foglight Performance Analysis for SQL Server 
Quest Software * www.quest.com 

Bronze: Performance Advisor for SQL Server 

SQL Sentry + www.sqlsentry.net 


Coria Quotes from your community about SOL 


Power Tools’ Zero Impact Enterprise Moni- 
tor... 

“Zero Impact Enterprise Monitor is essen- 
tial in our company for diagnosing perfor- 
mance problems.” 

“Valuable information, succinctly delivered.” 


Best Development Tool 


Editors’ Best 

Gold: Toad for SQL Server 

Quest Software * www.toadsoft.com 
Silver: SQL Developer Bundle 

Red Gate Software * www.red-gate.com 
Bronze: XtraPivotGrid Suite 
DevExpress * www.devexpress.com 


“Developers and DBAs get a lot of bang for 
the buck with this comprehensive toolset. 
Quest Toad is a great set for administrators 
or developers with crossover responsibili- 
ties. You get not only a management tool 
suite, but also a set of development tools you can use 
to bridge the gap with Visual Studio.” 
—Sheila Molnar, executive editor, 
SQL Server Magazine 


SQL Server Magazine + www.sqlmag.com 


“The Red Gate SQL Developer Bundle toolbox is 
constantly updated with the newest versions of Red 
Gate tools. Mike Campbell is a big fan of Red Gate 
tools. Here's Mike on one of the tools in the Bundle, 
Refactor: “One of the reasons I really love it is that it 
comes with code-formatting tools that work very well. 
Just a couple of menu item selections and code goes 
from hideous to following a large number of conven- 
tions that would be there had I written the code.” 
—Sheila Molnar, executive editor, 
SOL Server Magazine 


“The DevExpress XtraPivotGrid Suite provides 

OLAP data mining and cross-tab reporting for Win- 

Forms. New for 2009 is an updated version of the Ex- 
pressions editor and support for 50 new functions.” 

—Sheila Molnar, executive editor, 

SOL Server Magazine 
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Community Choice 

Gold: Adobe Dreamweaver 

Adobe * www.adobe.com 

Silver: Coderush 

DevExpress * www.devexpress.com 
Bronze: RadControls 

Telerik + www.telerik.com 


Quotes from your community about Adobe 
Dreamweaver ... 

“This web editor simply does everything.” 
“Everything is faster, easier, and more in- 
tuitive with Dreamweaver.” 

“Very highly recommended to anyone building 
websites.” 


Best Hardware: Server 


Editors’ Best 

Gold: HP ProLiant DL380 series 
HP + www.hp.com 

Silver: NEC 5800 series 

NEC + www.nec.com 

Bronze: Dell PowerEdge 

Dell * www.dell.com 


“The ProLiant line of servers is likely rep- 
resented in every data center in existence. 
The DL360 and DL380 are the work- 
horses of many IT shops, and for good 
reason: reasonably priced, extensive sup- 
port options, and a myriad of configurations.” 
—Michael Dragone, contributing editor, 
Windows IT Pro 


“Hardware is hardware. The CPU comes from Intel 
or AMD, the hard drives are industry-standard, and 


Your Top 10 Favorite 


iT Websites 


10. Google (www.google.com) 


9. Major Geeks (majorgeeks.com) 


8. Microsoft TechNet (technet.microsoft.com) 


7. The Register (www.theregister.co.uk) 


6. Server Fault (www.serverfault.com) 


5. Slashdot (slashdot.org) 


4, Windows IT Pro (www.windowsitpro.com) 


3. GPAnswers.com (www.gpanswers.com) 


2. The CodeProject (www.codeproject.com) 


1. Experts Exchange (www.experts-exchange.com) 
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the memory is OEM’d from Korea. The real ques- 
tion is, “Who is going to answer the phone when you 
have a problem?’ HP support is rock solid. Period.” 
—Eric B. Rux, contributing editor, Windows IT Pro 


“The HP ProLiant DL380 servers are fantastic virtu- 
alization hosts.” 
—Alan Sugano, contributing editor, Windows IT Pro 


Community Choice 

Gold: HP ProLiant DL380 series 
HP + www.hp.com 

Silver: Dell PowerEdge 2900 series 
Dell * www.dell.com 

Bronze: IBM BladeCenter Server 
IBM + www.ibm.com 


Quotes from your community about HP’s 
ProLiant DL380 servers ... 

“Excellent power, reliability, and manage- 
ability for a solid price.” 

“HP products always have fewer problems 
than those of other vendors.” 

“Reasonably priced, reliable, and highly expandable.” 


Best Hardware: Storage 


Editors’ Best 

Gold: Intel SSD drives 

Intel e www.intel.com 

Silver: nTier Deduplication appliance 
Spectra Logic * www.spectralogic.com 
Bronze: DroboPro 

Data Robotics * www.drobo.com 


“As the price of SSD drives continues to 
plummet, you'll need to seriously consider 
taking the leap. It’s the most worthwhile 
upgrade you can make to any computer 


Editors' Best 


SOLD | system these days, and the Intel drives are 
among the best SSDs available.” 

—Michael Dragone, contributing editor, 

Windows IT Pro 
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“Intel SSDs—along with the rise of virtualization 

and the boom in iSCSI SAN adoption—are un- 

doubtedly contributing to a revolution of storage in 
the enterprise.” 

—Jeff James, 

Windows IT Pro 


“The Drobo is exactly what today’s IT pros need— 

automated, easy-to-use, plug-and-play backup func- 
tionality in the form of a cool gadget.” 

—Jason Bovberg, senior editor, 

Windows IT Pro 


Community Choice 

Gold: EMC CLARiiON 

EMC + www.emc.com 

Silver: Dell EqualLogic PS5000 

Dell * www.dell.com 

Bronze: NetApp FAS3100 

NetApp ° www.netapp.com 
Quotes from your community about EMC 
CLARiiON ... 

“Easy-to-use, affordable networked storage.” 

“The new virtual-aware EMC CLARION 

perfect for my VMware environment.” 

“Love the expandability.” 


Best SharePoint Product 


Editors’ Best 

Gold: ControlPoint for SharePoint 

Axceler * www.axceler.com 

Silver: Professional Archive Manager 

for SharePoint 

Metalogix * www.metalogix.net 

Bronze: NearPoint for SharePoint 

Mimosa Systems * www.mimosasystems.com 


“Axceler ControlPoint helps IT pros get 
better control of their SharePoint environ- 
ment through permissions management, 
content management, in-depth usage analy- 
sis, policy enforcement, and flexible alerts 
and scheduled analyses.” 

—Jeff James, Windows IT Pro 


“ControlPoint provides the tools and intelligence to 
help you manage and monitor large farms effectively, 
and it integrates well with the existing SharePoint UI; 


the ability to manage user permission levels is nicely 
implemented.” 

—Curt Spanburgh, contributing editor, 

Windows IT Pro 


Community Choice 

Gold: Site Administrator for SharePoint 
Quest Software * www.quest.com 

Silver: Colligo Contributor Pro 

Colligo Networks * www.colligo.com 
Bronze: CorasWorks Workplace Suite 10 
CorasWorks * www.corasworks.com 


Quotes from your community about Quest 

Software’s Site Administrator for Share- 

Point ... 

“Has helped me completely understand and 

manage my entire SharePoint environment.” 
“For SharePoint management of servers and sites, 1t's 
the best and most comprehensive product out there.” 


Best Training and Certification 


Product or Service 


Editors’ Best 

Gold: LabSim 

TestOut * www.testout.com 

Silver: Train Signal Computer Training Videos 
Train Signal e www.trainsignal.com 

Bronze: PrepLogic eLearning Videos 
PrepLogic * www.preplogic.com 


Seen “TestOut’s LabSim is a true innovator in 
the IT training and certification space. This 
online lab technology helps further learning 
for professionals of all levels. Newcomers to 
the field can gain a level of hands-on experi- 


Editors’ Best 


ence on or off campus unlike ever before, and seasoned 

professionals have easy access to skills-based online 
training to earn additional certifications or degrees.” 

—Brian Reinholz, production editor, 

Windows IT Pro 


“The thing I like most about PrepLogic’s certification 
practice exams, like Network+ 2009 practice exam, is 
the answers. You're not going to get feedback like ‘C 
is the correct answer.’ Instead, the exams explain why 
the correct answers are right and why the incorrect an- 
swers are wrong. It’s a real learning experience.” 
—Tom Carpenter, contributor, Windows IT Pro 
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Community Choice 

Gold: Train Signal Computer 
Training Videos 

Train Signal e www.trainsignal.com 
Silver: LabSim 

TestOut * www.testout.com 
Bronze: Global Knowledge IT Training Classes 
Global Knowledge * www.globalknowledge.com 


Signal's training videos ... 

“TrainSignal videos are definitively the 
building blocks of creating a solid founda- 
tion when learning a technology such as 
Exchange 2007.” 

“Very polished, excellent instruction.” 


Best Virtualization Product 


Editors' Best 

Gold: VMware vSphere 4 

VMware * www.vmware.com 

Silver: NxTop 

Virtual Computer * www.virtualcomputer.com 
Bronze: Citrix XenServer 5.5 

Citrix * www.citrix.com 


“VMware vSphere 4 has a lot of nice new 
features, but you can justify the upgrade by 
the increase in performance alone. We're 


22009 
seeing performance increases of 20 to 30 


Editors Best) 
Vas percent and in some cases even higher de- 


pending on the application with the same hardware.” 
—Alan Sugano, contributing editor, Windows IT Pro 


“NxTop is a complete end-to-end solution that allows 
you to create and deploy VMs to systems with a man- 
agement console that helps you keep track of who 
has what. It also has a remote swipe option so that if 


a system gets stolen and boots up and connects, the 
VM evaporates.” 
—J. Peter Bruzzese, contributor, Windows IT Pro 


Community Choice 

Gold: VMware ESX Server 3.5 

VMware * www.vmware.com 

Silver: Symantec Endpoint Virtualization Suite 
Symantec * www.symantec.com 

Bronze: Citrix XenServer 

Citrix * www.citrix.com 


Quotes from your community about VMware 
ESX Server 3.5 ... 

“Simply the most important, sophisticated 
virtualization product on the market.” 
“Tt’s evolved into such a mature virtualiza- 
tion product!” 


Best Vendor Tech Support 


Gold: Dell * www.dell.com 
Silver: Microsoft * www.microsoft.com 
Bronze: Symantec * www.symantec.com 


Your Top 10 Least Favorite Things about VV Oring ny 


5. “Everyone | know wants me to 
fix their computer.” 


10. “Everybody thinks | can fix T: “Failing eyesight.” 
any problem with two mouse 
clicks.” 6. “The terrible hours: Everyone 
from the CEO to the village 
dog depends on me and will 
call me at 2 a.m. when their 
email is taking longer than 


five minutes to arrive.” 


A. “The money.” 

9. “The constant technology evo- 
lution: I’m outdated as soon 
as | get something in place.” 


3. “If I fail, everything fails.” 


2. “The smell.” 
8. “Balancing home life and work 


life.” 1. “End users.” 
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SQL Server Magazine Congratulates 
HyperBac Technologies, Inc. 


hyperbac 


Storage Compression. 
Object Level Recovery. 
Data Security. 
Transparent Integration. 


Gold Winner 
Best Backup & Recovery Software Product 
2009 SQL Server Magazine 


Community Choice Awards 
Choice 


GOLD 


Learn more about HyperBac at www.hyperbac.com 
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Part 2 


A better set-based solution 


ast month I presented a task to calculate the 

maximum number of concurrent sessions for 

each application. Web Listing 1 (www.sqlmag 
.com, InstantDoc ID 102926) contains code to create 
and populate a table called Sessions with a small set 
of rows to check the correctness of the solutions. Web 
Listing 2 contains code to create a helper table func- 
tion called GetNums, which returns a table result with 
a sequence of integers of a requested size. Web Listing 
3 contains code to populate the Sessions table with a 
large set of rows to test the solutions’ performance. 

The task at hand involves calculating, for each 
application, the maximum number of concurrent 
sessions. That is, for each application, calculate the 
maximum number of simultaneously active sessions. 
Recall that in case one session ends at exactly the same 
time that another session starts, you need to implement 
a rule dictating whether both are considered active at 
that point. For our purposes, the assumption is that 
they aren’t. For the small set of sample data in Web 
Listing 1, the desired output is shown in Web Table 1. 

Last month I presented two solutions that I’ve 
used for years. One is a set-based solution that uses a 
subquery with a count aggregate. I refer to that solu- 
tion as the original set-based solution. That solution’s 
algorithmic complexity (or rather, the way its execution 
plan scales) is quadratic. That is, if you increase the 
number of rows per partition (application) by a factor 
of f for the same period of time, the run time increases 
by a factor of f. So beyond very small partitions, the 
solution doesn't scale well. 

The second solution is cursor-based. The algo- 
rithmic complexity of that solution is linear. That is, if 
the number of rows per partition increases by a factor 
of f, the run time also increases by a factor of f. The 
cursor-based solution scales better than the original 
set-based solution, but it has the obvious disadvantages 
of cursors related to readability, maintainability, and 
not being in accord with the relational model. 
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For years I looked for a set-based solution that 
performs better than the cursor-based solution for 
all partition sizes—to no avail—until recently. In this 
article I present a new set-based solution with linear 
complexity that performs better than the cursor-based 
solution. I developed the solution based on insights of 
Darryl Page from the UK, who attended one of my 
classes in which I gave this problem as an exercise. I also 
present a set-based solution that’s based on language 
elements that SQL Server doesn’t yet support (as of 
SQL Server 2008). Once such support is introduced, 
the solution is likely to perform better than all others. 


New Set-Based Solution 
Listing 1, page 22, contains the new set-based solution. 
Figure 1, page 22, shows the execution plan for the solu- 
tion. As you can see, the solution consists of two parts. 
The first part involves creating a temporary 
table called #Ends with a clustered index 
on (app, endtime), and populating it with 
the result of a query against the Sessions 
table. The query against Sessions retrieves, 
for each session, the application name (app), session 
end time (endtime), and a rank partitioned by app 
and ordered by endtime. The target column holding 
the rank values is called n. The reason for using the 
RANK function here rather than ROW_NUMBER 
has to do with the rule that says if a session ends at 
the same point in time that another starts, the sessions 
aren't considered concurrent. Rank values for multiple 
sessions with the same end time will be equal to the 
lowest row number you would get based on the same 
partitioning and ordering specification. The rank value 
minus | indicates how many sessions against the cur- 
rent application ended before the point in time when 
the current session ended. This fact is important in the 
second part of the solution. 

The execution plan for the solution's first part is 
in the section called Query 1 in Figure 1. The plan 
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LISTING |: New Set-Based Solution 


-- Part 1 
CREATE TABLE #Ends 
K 


app VARCHAR(1Ø) NOT NULL, 
endtime DATETIME, 
n BIGINT 


3 
CREATE CLUSTERED INDEX idx_app_et ON #Ends(app, endtime); 


INSERT INTO #Ends(app, endtime, n) 
SELECT app, endtime, 
RANK() OVERCPARTITION BY app ORDER BY endtime) AS n 
FROM dbo.Sessions; 


-- Part 2 
WITH Counts AS 


SELECT S.app, S.starttime, 
ROW_NUMBER() OVERCPARTITION BY S.app ORDER BY S.starttime) 
- A.n + 1 AS cnt 
FROM dbo.Sessions AS S 
CROSS APPLY (SELECT TOP (1) E.n 
FROM #Ends AS E 
WHERE E.app = S.app 
AND E.endtime > S.starttime 
ORDER BY E.endtime) AS A 
y 
SELECT app, MAX(cnt) AS mx 


BT concunnent sessions 


doesn’t involve 
sorting—neither 
for the calcula- 
tion of the rank 
values nor for 
populating the 
target clustered 
index—which 
is a key aspect 
that contributes 
to the solution’s 
performance. An 
ordered scan of 
the index created 
on Sessions(app, 
endtime, start- 
time) supports 
the calculation of 
the rank values 


FROM Counts 
GROUP BY app; 


DROP TABLE #Ends; 
GO 


and produces 
the result in the 
target clustered 
index order. 

As for the second part of the solution, first consider 
the query used to define the common table expression 
(CTE) Counts. The query uses a CROSS APPLY oper- 
ator to match each current session from the Sessions 
table with the first session from the #Ends table where 
#Ends.endtime is greater than Sessions.starttime. You 
need the rank value from that session (column n). The 
query then calculates a row number for each session 
partitioned by app, and ordered by starttime (call it 
rownum); rownum indicates how many sessions started 
so far. In case multiple sessions started at the same 
time, the maximum rownum for those sessions is the 
correct indication of how many sessions started so far. 
Now, rownum - (n - 1), which is equal to rownum - n 
+ 1, gives you the number of concurrent sessions at 


the time the current session started (call it cnt). That 
is, if you subtract from the row number the rank value 
of the first session that ended after the current session 
started and add 1, you get the number of concurrent 
sessions at the point when the current session started. 

What’s left for the outer query against the CTE 
Counts to do is simply to group the data by app, and 
return for each app the max cnt. The execution plan 
for the second part of the solution appears in the 
section called Query 2 in Figure 1. First, the index on 
Sessions(app, starttime, endtime) is scanned in order. 
For each row (session) returned from this scan, the 
plan performs an Index Seek operation in the clustered 
index on the #Ends(app, endtime) table to retrieve the 
n value from the first session for the current application 
with an end time greater than the current session's start 
time. The cost of this seek operation in terms of I/O 
is as many page reads as the number of levels in the 
index (three, in our case). The plan also calculates row 
numbers (Segment and Sequence Project operators), 
which are used along with n to calculate cnt. Finally, 
the plan uses a Stream Aggregate operator that relies 
on the ordered scan of the index on the Sessions table, 
to calculate the max cnt per application. 

What's important about the solution is that it has 
linear complexity due to the constant cost per row. The 
cost is constant per row in both parts of the solution. 
In the first part, both the calculation of the rank values 
and the insertion to the target clustered index are based 
on the existing ordering of the index on Sessions. In the 
second part, there's a constant cost per row, including 
the portion of the scan of the index against sessions, 
the index seek, and the aggregate. The index seek is 
the most expensive part of the plan—you pay as many 
reads per row in Sessions as the number of levels in the 
index (three, in our case). The cursor-based solution 
is less expensive in terms of I/O cost; however, due to 
the extra overhead that cursors incur for each record 


ta 
INSERT 
cost: 0 % 


E 
Eost: Oise cost: 0 % 


Query 1: Query cost (relative to the batch): 17% 
INSERT INTO #EndsCapp, endtime, n) SELECT app, endtime, RANKO) OVERCPARTITION BY app ORDER BY endti 


ha B8 
a 
Clustered Index Insert ¢ 
[#ends]. [idx_app_et] 
Cost: 82 % 


& Sequence Project 
(compute scalar) 
Cost: 0 % 


Top 
Cost: 0 % 


Query 2: Query cost (relative to the batch): 83% 
WITH Counts AS ( SELECT S.app, S.starttime, ROW_NUMBER C) OVERCPARTITION BY S.app ORDER BY S.startti 


93 ; i] 
m e Nested Loops 


(Inner Join) 
Cost: 0% cost: 


sequence Project 


(compute scalar) segment 


Cost: 0 % 


Segment 
Cost: 0 % 


Segment 
Cost: 0 % 


Index Scan (Nonclustered) 
[sessions]. [idx_nc_app_et_st] 
Cost: 18 % 


Index scan (Nonclustered) 
] [sessions]. [idx_nc_app_st_et]... 


2% Cost: 4 % 
Tol eaclustered Index Seek (Cluster. 
cost: 0% [ends]. [idx_app_et] [El 


Cost: 93 % 


Figure | 


Execution plan for new set-based solution 
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CONCURRENT SESSIONS eee | 
manipulation compared with set-based solu- 


tions, the new set-based solution performs teint Concurrent Sessions Benchmark 
better overall than the cursor-based one. 

Figure 2 shows benchmark results for all 
the solutions’ performance (the old set-based 
solution, the cursor-based solution, and the 
new set-based solution). The old set-based 
solution has quadratic complexity, whereas 
the cursor-based solution and new set-based 
solution have linear complexity. The run time 
for the new set-based solution is about half 
the run time of the cursor-based one. 


50 


—4— Set-Based 
Original 


== Cursor-Based 


—@— Set-Based New 


Solution Based on 
Window Aggregate 
Functions 
Although you now have a set-based solution 
that performs better than the cursor-based 
solution and scales linearly, the I/O cost is 
still high due to the Index Seek operation required per We can't examine this solution’s plan, because SQL Figure 2 
row from the Sessions table. As an example, if you have Server doesn’t support it yet. If such support is intro- Benchmark results 
1,000,000 rows in the table, you pay 3,000,000 reads for duced, the plan will likely involve an ordered scan of 
the Index Seek operations. an index on (app, starttime) plus an ordered scan of an 
A set-based solution exists that has the potential to index on (app, endtime), and both the calculation of 
outperform the other solutions; however, this solution the window aggregate and the grouping will rely on that 
relies on standard language elements that arent yet ordering. So it’s a matter of scanning the data twice in 
implemented (as of SQL Server 2008). The solution uses index order, with no explicit sorting, expensive cursor 
window ordering (the ORDER BY clause) and framing overhead, or expensive Index Seek operations. This 
(the ROWS clause) for window aggregate functions. solution is likely to outperform all the others. 
The logic of this solution is actually the same as the 
logic for the cursor-based solution from last month. Always Seek a Better Solution 
But instead of using a cursor to calculate a running Even if you can’t find a good performing set-based 
aggregate of the event type (the +1 or -1 representing solution for a task, it doesn’t mean that such a solu- 
whether a session starts or ends), you rely on a wndow tion doesn't exist. Revisit problems and look for new 
aggregate function to calculate the running aggregate. insights that can lead to new, better-performing solu- 
Listing 2 contains the solution (remember that you tions. Hopefully Microsoft will enhance the OVER 
can't run it because it isn't supported yet). clause and include the missing elements for window 
The query to define the CTE Events is the same as aggregate functions. Such enhancements will allow 
the one to define the cursor. This query unifies the set much simpler and more efficient solutions in the future. 
of start events of sessions with the set of end events [SQL] 
of sessions. A +1 value is assigned as the event type InstantDoc 1D 102926 
(event_type attribute) for the start of a session, and a -1 
is assigned as the event type for the end of a session. LISTING 2: Solution Based on Window 
The code then defines a second CTE called Counts Aggregate Functions 
that is based on a query against Events. This query uses 
a SUM OVER aggregate function, calculating the sum 


Num Rows 


WITH Events AS 
K 


SELECT app, starttime AS ts, 1 AS event_type FROM dbo.Sessions 


of the event_type values, partitioned by app, ordered UNION ALL 
by ts (timestamp) and event_type. The ROWS clause SELECT app, endtime, -1 FROM dbo. Sessions 
frames the applicable window for the calculation. For Counts AS 


the defined partition with the defined ordering, the : SELECT emo, 
; 4 SUM(event_type) OVER(PARTITION BY app 

frame is all rows with no low boundary point and OREERIEY E caos 

until the current row. The result of this calculation is ROWS BETWEEN LA BONDED IERE EDING 
A A cnt 

the number of concurrent sessions during the current FROM Events 

timestamp (called cnt). Finally, the outermost query eee ape ACN AS UK 

simply groups the rows from the CTE Counts by app, FROM Counts 


on de GROUP BY app; 
and returns the max cnt per application. 
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Use these I-SQL queries to gather useful data 


regarding your SharePoint environment 


When it comes to Microsoft network service applica- 
tions, like SharePoint, a pretty package and simple 
interface mask the complexity of what the software 
is really doing. Many SQL Server DBAs have been 
challenged and confounded by their responsibilities to 
SharePoint databases that seem to magically appear 
overnight. This article will take some of the sting out 
of administering SharePoint content databases in a 
SQL Server instance. First we'll take a quick peek at 
the schema of the mysterious Windows SharePoint 
Services (WSS) content database and identify some 
objects ripe for querying. Then we'll explore specific 
T-SQL queries that can be used to garner useful details 
about a SharePoint environment. Finally, we'll take a 
look at the potential dangers of altering the content 
database via SQL Server. 

Tt's important to note that everything you're about 
to read goes against Microsoft best practices, and for 
good reason. Messing around with SharePoint data- 
bases directly in SQL Server can cause stability and 
security problems in SharePoint, as well as prevent 
successful troubleshooting and support when you 
need it from Microsoft. Carefully consider these risks 
before employing any of the outlined procedures in a 
production SharePoint environment. Remember, it’s all 
fun and games until something (the server) gets hurt. 


SharePoint? Content Database 
Schema 

Of all the SharePoint databases in a WSS 3.0 or Micro- 
soft Office SharePoint Server (MOSS) 2007 farm, the 
content database is by far the most volatile. In fact, 
the content database sees so much action its default 
recovery model (Full) makes its transaction log file 
a prime suspect when it comes to storage depletion. 
You would think a database that's so popular among 
routine SharePoint transactions would have a wealth 
of documentation written about it. However, that's 
not the case. In fact, because Microsoft recommends 
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all interaction with SharePoint databases be conducted 
either through the SharePoint GUI or via program- 
ming against the SharePoint object model, there's little 
explanation of the content database’s structure. 

Identifying SharePoint’s content databases in a 
SQL Server instance is fairly easy if the default data- 
base name of WSS_Content({GUID}) was generated 
by SharePoint Central Administration. However, 
administrators can also create custom names for con- 
tent databases, so you might need to peek inside a data- 
base to determine if it is, in fact, a SharePoint content 
database. So let’s break things down a bit by starting 
with some user-defined tables that can be found in 
a SharePoint content database. Each table serves a 
particular purpose and several of them would seem 
to be related, yet few referential integrity connections 
exist. For example, the three most important tables 
concerning document libraries all have primary keys 
and indexes but no foreign key relationships with one 
another (see Figure 1, page 28). Suffice it to say that 
the WSS content database doesn’t use normalization 
to its advantage, which makes writing direct T-SQL 
queries into it a distinct challenge. Furthermore, most 
of the row data values are identification numbers 
(some object GUIDs, others internally generated), 
and although the columns containing these numbers 
make excellent reference choices for JOIN statements, 
the numbers themselves aren’t easily recognizable to 
human users. 

So how do you determine which tables hold the 
information you need to get out of SharePoint? Relying 
on tables names isn’t the best method because some of 
the names are misleading. For instance, the AllDoc- 
Versions table appears to contain different versions 
of documents held in a library that has versioning 
enabled. However, the actual documents are binary 
large objects of the image data type held in the AllDoc- 
Streams table’s Content column with a reference to the 
version ID number from the AllDocVersions table that 
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A database diagram 
showing the AllDoc- 
Versions, AllDocs, and 
AllDocStreams tables 


DeleteTransactionId 
ParentId 


Size 
[Level] 


[Content] 


table. These views retrieve the entire 
column set of all rows not marked 
for deletion. So querying the view 
instead of the underlying table would 
retrieve a smaller result set but you 
could be missing desired rows during 
a salvage operation. These views sup- 
port the relational data integrity of 
rows marked for deletion while also 


MetalnfoSize 
Version 


UlVersion 


CacheParseld 
DocFlags 


making it possible to recover previ- 
ously deleted items such as by using 
the Recycle Bin. 

None of the table objects listed 
in Table 1 contain any foreign key 
constraints, and as you can see from 
the Primary Key columns, almost all 
of them contain duplicate data. The 


corresponds to the current version of the document. 
Talk about confusing! Table 1 outlines some of the 
more useful tables in SharePoint's content database 
that might be ripe for querying. 

It would seem from Table 1 that SharePoint fails 
to follow a tried and true Microsoft best practice in 
SQL Server: Composite primary keys should be cre- 
ated sparingly and only when absolutely necessary. 
This SQL Server best practice stems from the theory 
that composite primary keys lengthen the key values 
of the corresponding index, resulting in less efficient 
query optimization and disk utilization. However, since 
it would be inadvisable to alter the table structure of 
SharePoint's content database, we'll just have to assume 


content database is highly denor- 
malized, a condition that enhances OLAP processing. 
Performance indicators will show more efficiency during 
read operations than during write operations. Further- 
more, the dependencies of these tables list a bevy of 
stored procedures and functions that SharePoint employs 
for error control and row manipulation. Any direct que- 
ries into these tables should be written so as to eliminate 
duplicate or unrelated row data in the result set. 


Querying the Content Database 
Now that we’ve seen a few tables that contain useful 
information about our SharePoint environment, let’s 
take a look at writing T-SQL queries directly into them. 
Keep in mind that the same results could be obtained 


12.0.0.4518.0.0<FieldRef Name="ContentTypeld"/><FieldRef Name="Title" 
ColName="nvarchar1"/><FieldRef Name="_ModerationComments" ColName="ntext1"/><FieldRef 
Name="File_x0020_Type" ColName="nvarchar2"/><FieldRef Name="LastNamePhonetic" 
ColName="nvarchar3"/><FieldRefName="FirstName" ColName="nvarchar4"/><FieldRef 
Name="FirstNamePhonetic" ColName="nvarchar5"/><FieldRef Name="FullName" 
ColName="nvarchar6"/><FieldRef Name="Email" ColName="nvarchar7"/><FieldRef Name="Company" 
ColName="nvarchar8"/><FieldRef Name="CompanyPhonetic" ColName="nvarchar9"/><FieldRef 
Name="JobTitle" ColName="nvarchar10"/><FieldRef Name="WorkPhone" 
ColName="nvarchar11"/><FieldRef Name="HomePhone" ColName="nvarchar12"/><FieldRef 
Name="CellPhone" ColName="nvarchar13"/><FieldRef Name="WorkFax" 
ColName="nvarchar14"/><FieldRef Name="WorkAddress" ColName="ntext2"/><FieldRef 
Name="WorkCity" ColName="nvarchar15"/><FieldRef Name="WorkState" 
ColName="nvarchar16"/><FieldRef Name="WorkZip" ColName="nvarchar17"/><FieldRef 
Name="WorkCountry" ColName="nvarchar18"/><FieldRef Name="WebPage" ColName="nvarchar19" 
ColName2="nvarchar20"/><FieldRef Name="Comments" ColName="ntext3"/><Field Type="Number" 
DisplayName="Years" Required="FALSE" ID="{3cd16be7-60a1-4e82-9404-8ecb851cd704}" 
SourcelD="{5406313d-3a5b-45b5-9cc2-b09dd502e24d}" StaticName="Years" Name="Years" 
ColName="float1" RowOrdinal="0"><Default>10</Default></Field> 


programmatically by writing 
.NET code against the SharePoint 
object model, but for many Share- 
Point admins and SQL Server 
DBAs, a new Query window in 
SQL Server Management Studio 
(SSMS) is sometimes quicker than 
tasking the developers in IT with a 
new project. Here are a just a few 
common scenarios that can be 
easily resolved by simple T-SQL 
queries. 


Determining Which 
Site Template Was 
Used 


In an existing SharePoint environ- 


Figure 2 


Sample tp_fields 
column results 


28 December 2009 


the SharePoint product team had their reasons for vio- 
lating SQL Server best practices and leave the indexes as 
they lay. Furthermore, the tables listed with an asterisk 
next to their name in Table 1 have dependent View 
objects of the same name without the “All” prefix. For 
example, the Docs view retrieves rows from the AllDocs 


ment, it can often be difficult to 
determine which site templates were used to generate 
the sites, especially if administrators have customized 
the pages or added and deleted lists, libraries, and 
Web Parts. A quick query to the Webs table of the 
content database will reveal template and configura- 
tion information that can be translated by reading the 
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appropriate XML file contained in SharePoint's “12 
hive” (Le., %SystemDrive/WProgram Files\Common 
Files\Microsoft Shared\Web Server Extensions\12 of 
the SharePoint server). This information will reveal the 
templates used for every site in the database. Although 
SharePoint’s default behavior is to generate a separate 
content database for each web application, it’s possible 
to span a single web application across multiple content 
databases or combine multiple web applications into a 
single content database. The following query reveals 
template information for all sites in the content data- 
base, regardless of web application assignment: 


SELECT Title, WebTemplate, ProvisionConfig 
FROM [dbo]. [Webs] 


To determine the template of a particular site, use a 
WHERE clause to filter the site by Title, Description, 
or Site ID number, as the following command shows: 


WHERE Title = SiteA 


The Web Template ID number and Provision Con- 
figuration ID number returned by the query might not be 
recognizable at first. The more friendly text name of the 
template can be garnered from one of the web configura- 
tion XML files located in the 12 hive on the SharePoint 
server. For example, WSS 3.0 site templates are listed in 
the webtemp.xml file, while many of the MOSS 2007 
templates are defined in the webtempsps.xml file. With 
some experience, you'll be able to identify the template 
by its ID and Configuration Option numbers 
and not need to read these XML files. 


SELECT tp_fields 
FROM [dbo]. [AllLists] 


Figure 2 shows the output from this command 
laid out like an XML file, with a separate tag for each 
column divulging details such as column type, whether 
a value is required, size limit, and default assignment. 
However, finding the column of interest can be a bit 
difficult if the same column name is used in more than 
one list. It's best, if possible, to isolate this query to a 
particular list, library, or gallery by using a WHERE 
clause to avoid misinformation. Also keep in mind that 
although the values in the tp_fields column look like 
XML, they are actually ntext strings, so if you need to 
extrapolate only one column’s worth of information 
you'll need an expression such as substring(). 


E] Results EF Messages | 
p tp_Title 
SearchCenter Search 1 AW administrator 
SearchCenter Search 12 NT AUTHORITY authenticated users 
SearchCenter Search 13 NT AUTHORITY \local service 
SearchCenter Search 1073741823 System Account 
13 | Reports Reports 1 AW administrator 
14 | Reports Reports 12 NT AUTHORITY'S authenticated users 
15 | Reports Reports 13 NT AUTHORITY \local service 
16 | Reports Reports 1073741823 System Account 
17 | SiteDirectory Sites 1 AW administrator 
18 | SiteDirectory Sites 12 NT AUTHORITY \authenticated users 
19 | SiteDirectory Sites 13 NT AUTHORITYNocal service 
Figure 3 


Sample results from 
querying for user 
names 


TABLE |: Commonly Accessed Tables of the Content Database 


Collecting Column 
Definitions 

Imagine a scenario in which you suspect 
various lists have columns that are too gen- 
erous with their storage size and you need 
to quickly determine the construction of all 
columns from a particular list, library, or 
gallery to prove it. Visiting the properties of 
each column individually in the GUI is too 
time-consuming, and writing code might be 
too complex. However, simply querying the 
AllLists table in the content database can 
reveal column information about any and 
all lists throughout your SharePoint envi- 
ronment. The AllLists table contains a row 
for each list, library, and gallery throughout 
the logical portion of SharePoint that uses 
the given content database. Of the many 
columns in this table, the tp_fields column 
(ntext data type) contains detailed informa- 
tion about all of the columns in that row’s 
particular list, library, or gallery. You can 


Perms 


Sites 


AllDocs* 


AllDocVersions* 


AllLists* 


ContentTypes 


WebParts 


Workflow 


Document metadata from all libraries 


Version metadata from all libraries 


List metadata for all lists per site 


Content type metadata from all galleries 


Events for which alerts are defined n/a 


Permissions stored in binary ACLs 


Site metadata for all sites in web app n/a 


Web Part metadata for all galleries 


Workflow metadata for all workflows 


SitelD, DirName, LeafName 


SitelD, Id, Version 


tp_Webld, tp_ID 


Siteld, Class, Scope, ContentTypelD 


EventCache 


Siteld, DelTransld, ScopeUrl 


Tp_Siteld, tp_ID, tp_Level 


Id, Listld, Siteld, Webld 


use the following code to query the tp_fields 
column: 
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1 Clustered indexes exist on the primary key columns of all primary key constraint-bearing tables, except the WebParts table, 
which has a non-clustered index on its primary key columns. 
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TABLE 2: Frequently Executed Stored Procedures 


Stored Procedure 
Name 


proc_AddContentType- 
ToScope 


proc_AddDocument 


proc_AddListltem 


proc_CreateSite 


proc_Delete 
RecycleBinltem 


Proc_DeleteSite 
proc_DeleteSitelnternal 
proc_UpdateListltem 


proc_UpdateView 
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Purpose 


Adds a custom content 
type to a gallery 


Adds a new document to 
a library 


Adds a new item to 
a list 


Creates a new site 


Purges an item from the 
Recycle Bin 


Deletes a site from Sites 
Deletes site objects 
Revises a list item 


Spawns 


fn_IsOverQuotaOrWriteLocked, proc_SplitUrl, proc_EscapeForLike, 
proc_LogContentlypeChange, proc_UpdateDiskUsed, 
proc_QMChangeSiteDiskUsedAndContentTimestamp 


fn_ sOverQuotaOrWriteLocked, fn_RoundDateToNearestSecond, 
proc_CanonicalDirNameFromUserlnput, proc_GetAttachmentParentScopeld, 
proc_SplitUrl, proc_CreateDir, proc_GetUniqueFileName, 
proc_UpdateChildCount, proc_LogChange, proc_ResyncWelcomeLinks, 
proc_UpdateAttachmentsFlag, proc_AddAuditEntryFromSal, 
proc_GetLockInfo, proc_QMChangeSiteDiskUsedAndContentTimestamp 


fn_IsOverQuotaOrWriteLocked, 

fn_RoundDateToNearestSecond proc_GenerateNextld, 
proc_GetTargetOrderNumber, proc_CanonicalDirNameFromUserInput, 
proc_GetUniqueFileName proc_AddDocument, proc_CreateDir, 
proc_PostProcessAddMtgListltem, proc_AddEventToCache, 
proc_SecChangeToUniqueScope, 
proc_QMChangeSiteDiskUsedAndContentTimestamp 


fn_RoundDateToNearestSecond, proc_LogChange, 
proc_SecAddPermScopeForWeb, proc_CreateWebNavStruct, 
proc_ProvisionWeb, proc_CreateDefaultRoles, 
proc_QMChangeSiteDiskUsedAndContentTimestamp 
proc_DeleteFromNVP, proc_GetCollation, proc_AutoDropWorkflows, 
proc_UpdateDiskUsed, proc_QMChangeSiteDiskUsedAndContentTimestamp 
proc_DeleteSitelnternal, proc_LogChange 

proc_DeleteFromNVP, proc_UpdateDiskUsed 


fn_lsOverQuotaOrWriteLocked, fn_RoundDateToNearestSecond, 
proc_CloneDoc,proc_ChangeLevelForDoc, proc_CreateDocVersion, 


SharePoint environ- 
ment. Once you become 
familiar with the tables 
from the content database 
listed in Table 1, page 
29, you'll find all 
kinds of new ways to 
decipher the myriad of 
ID numbers through-out 
the rows to join descriptive 
titles from other tables and 
produce recognizable 
result sets. 


A Word of 
Warning 

The T-SQL statements 
we've looked at so far 
have all been relatively 
harmless. The true 
“moving parts” of a 
SharePoint content data- 
base are the stored pro- 
cedures and functions 
that manipulate the 
table rows. Although it 
might be tempting to get 
underneath the hood of 
these objects, be aware 


Repopulates a view 


proc_PostProcessUpdMtgListltem, proc_ManageVersions, 
proc_GetAuditMaskOutput, proc_AddEventToCache, 
proc_QMChangeSiteDiskUsedAndContentTimestamp 


proc_OnUpdateWebParts, proc_MakeViewMobileDefaultForList, 


proc_EnsureMobileDefaultViewForList, 


proc_MakeViewDefaultForContentlype, proc_LogChange 


Gathering Security Information 
It might be valuable to know who is accessing the data- 
bases you support. Although SharePoint maintains its 
own security architecture, from an audit and logging 
perspective, it would be nice to quickly see the security 
principals and permission assignments SharePoint 
is using to grant access to the content database. For 
example, say you just want a quick and dirty list of all 
users who have access to the sites in SharePoint. Not 
individual permission levels or anything, just a list of 
names. Executing the query 


SELECT dbo.Webs.FullUrl, dbo.Webs.Title, dbo. 
UserInfo.tp_ID, dbo.UserInfo.tp_Title 

FROM dbo.UserInfo JOIN dbo.Webs 

ON dbo.UserInfo.tp_SiteID = dbo.Webs.Siteld 


returns results similar to those shown in Figure 3, 
page 29. 

There are many more queries that will retrieve 
information directly from SQL Server about your 


that altering a stored 
procedure could cause 
otherwise dormant trig- 
gers to fire and disabling 
corruption to ensue. 
The stored procedures 
and functions shown in 
Table 2 are best left alone in production SharePoint 
environments. If you really want to see what a stored 
procedure is doing, consider scripting it to a new query 
window or file in SSMS. 


Explore These Objects in a Test 
Environment 
This article has shed a bit of light on the schema of 
the WSS content database and identified some objects 
ripe for querying. There’s much more to the SharePoint 
databases, far more than can be covered in a single 
article, but the objects outlined herein should give you a 
starting point. Exploring these objects via SQL Server 
should always be performed in a development or testing 
environment and never in production. A simple slip of a 
mouse click could render a stored procedure capable of 
corruption, so be careful. For more information about 
these and other SharePoint objects in SQL Server, see 
the WSS library available at msdn.microsoft.com. El 
InstantDoc ID 102848 
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Passin 


JULL 


Parameters 


How to handle the unknown 


question I often see in public and MVP 
A newsgroups (where I spend far too much 

time) is, “I have a stored procedure that can 
accept a NULL value as a parameter, but it doesn't 
seem to work correctly. What am I doing wrong?” The 
easiest way to answer this question is with examples. 
Although the following examples use SQL Server 
2008 and ADO.NET 3.5, most of the techniques Pll 
be showing you will work with earlier versions of SQL 
Server and ADO.NET (even classic ADO). 

To make this simple, let's create the dbo.GetProd- 
uctsByShipDate stored procedure in Listing 1 in the 
Adventure Works2008 sample database. In this stored 
procedure, the only input parameter is a date that has 
the new data type of date. This data type has no time 
component, which makes testing to see whether a 
value is an exact date far simpler. Although this data 
type is new to SQL Server 2008, the dates stored in the 
Adventure Works2008 database have the old-fashioned 
datetime data type, so the stored procedure performs 
the conversion server-side. 

As you can see in Listing 1, the input parameter 
is set to NULL if no value is sent by the code that 
invokes the stored procedure. Actually, that’s inac- 
curate. The default value is used if a parameter isn’t 
passed in. For example, as Figure 1 shows, the T-SQL 
query processor knows to substitute NULL for the 
@DateWanted value when the stored procedure is 
invoked with no parameters. In other words, by not 
passing in a parameter, you're forcing the use of the 
default parameter value. 

Note that if you'd like to try running the dbo 
.GetProductsByShipDate stored procedure, you can 
download it (as well as the other code examples pre- 
sented here) by going to www.sqlmag.com, entering 
102592 in the InstantDoc ID text box, and clicking 
the 102592.zip hotlink. If you’re running SQL Server 
2005 or earlier, you need to change the date data type 
to datetime. 
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Forcing the Use of Default 
Parameter Values 

Being able to force the use of default parameter values 
presents developers with a handy option when designing 
applications that use ADO.NET to execute stored pro- 
cedures with optional named parameters. If you want a 
stored procedure to use the procedure-specified default 
value, simply don't generate a parameter for the SqlCom- 
mand object that's executing the stored procedure. (Be 
sure to specify the default parameter value in the stored 


ORE on the WEB 


Download the code at 
InstantDoc ID 102592. 


William Vaughn 


(billva@ betav.com) is an expert on Visual 
Studio, SQL Server, Reporting Services, and 
data access interfaces. He's coauthor of the 
Hitchhikers Guide series, including Hitchhiker's 
Guide to Visual Studio and SQL Server, 7th ed. 
(Addison-Wesley). 


LISTING I: The dbo.GetProductsByShipDate 
Stored Procedure 


CREATE PROCEDURE dbo.GetProductsByShipDate 
@DateWanted date = NULL 
y 


AS 

SELECT Name, Color, StandardCost, ListPrice, SellStartDate 

FROM Production.Product 

WHERE (Convert(date, SellStartDate) = @DateWanted) OR (@DateWanted IS NULL) 
RETURN 


procedure; otherwise, the code will 
throw an exception.) This means that 
you can build logic that bypasses the || EXEC dbo.GetProductsByShipDate 
creation of any parameter when you | 
want the server-side query processor 
to use the specified default value. An 
example of this approach is shown 
in Listing 2, page 34. 

Admittedly, this approach isn't 
very practical for stored procedures 
with a lot of optional parameters because your devel- 
opers will have to come up with a state machine to 


Queryl: Query(be...entureWorks2008)| 


Name Color 


NULL 


Adjustable Race 


Bearing Ball NULL 


Figure | 


Results from execut- 


figure out which Parameter objects to build and which ing dbo.GetProducts- 
to populate. Of course, that’s not that hard to do once ByShipDate with no 
parameters 


you figure out a good strategy. 
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LISTING 2: Executing the Stored Procedure with 


No Parameters 


Private Sub btnQuery2_Click(ByVal sender As System.Object, 
ByVal e As System.EventArgs) Handles btnQuery2.Click 


Try 


cn = New SqlConnection(strConnection) 


Using cn 


cmd = New SqlCommand("dbo.GetProductsByShipDate", cn) 
cmd.CommandType = CommandType. StoredProcedure 
' Pass no parameters-the parameter is set to the default value (NULL). 


cn.Open() 


Dim dr As SqlDataReader, dt As New DataTable 
dr = cmd.ExecuteReader(CommandBehavior.CloseConnection) 


dt.Load(dr) 


DataGridViewl.DataSource = dt 


End Using 
Catch ex As Exception 
MsgBox(ex.Message) 
Finally 
cn.Close() 
End Try 
End Sub 


Figure 2 


Ul for dbo.Get- 
ProductsByShipDate 


Dealing with NULL Values in an 
Application’s Logic 

Users don't always fill in all the input fields in appli- 
cations’ UIs, which can cause problems. For example, 
suppose the UI for the application that runs the dbo 
.GetProductsByShipDate stored parameter has a 
single input field in which users are supposed to enter 
the desired shipping date. In this scenario, does a 
blank field mean that the user simply forgot to fill in 
the field, or does it mean that the user wants to pass in 
NULL? Asking users to enter NULL when applicable 
wouldn't be a viable option because most users don’t 
understand the concept of NULL. 

If your application needs to decide at runtime 
whether a user simply forgot to fill in a field or wants 
to pass in NULL, a better solution would be creating 
a Ul like that in Figure 2. When a user enters a date in 
the Date Shipped input box, the application executes 
the dbo.GetProductsByShipDate stored parameter, 
using that date as the input parameter. If the Not 
Shipped Yet check box is selected, NULL is used as 
the input parameter. 

Regardless of how your UI works, it’s up to you to 
decide how to set the parameter value passed to the 


LISTING 3: Setting the Parameter Value to 


NULL 


With cmd 


.CommandType = CommandType. StoredProcedure 
.Parameters.Add("@DateWanted" , 
If cbUseNull.Checked = True Then 
' Either of these appproaches will work (in VB.NET) 
' .Parameters("GDateWanted”).Value = DBNull1.Value 
.Parameters("@DateWanted").Value = Nothing 


Else 


.Parameters("@DateWanted").Value = CDate(txtDateInput. Text) 


End If 
End With 
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SqlDbType.Date) o e 


Else 


End If 
End With 


query processor. For example, there are a couple of 
options for the UI in Figure 2: 

e Option 1: Set the parameter value to NULL. 

Test for a checked state in the Not Shipped Yet 
check box. If that check box is selected, set the @ 
Date Wanted parameter value to DBNull. Value or 
Nothing in Visual Basic.NET code or null in C# 
code. Otherwise, use the date in the Date Shipped 
input box for the 

@DateWanted parameter value. The code in 
Listing 3 shows the use of both DBNull.Value and 
Nothing. Both lines produce the same result. 
Option 2: Bypass the creation of the parameter. 

Test for a checked state in the Not Shipped Yet 
check box. If that check box is selected, bypass the 
creation of the parameter for the SqlCommand 
object. Otherwise, use the date in the Date Shipped 
input box for the (YDate Wanted parameter 

value. The code in Listing 4 demonstrates this 
approach. This code depends on having the 
default parameter value set in the stored procedure 
definition. Although coupling your Ul to the 
database doesn't follow best practices, this option 
is easy to understand and manage. 


Note that not all of the new SQL Server 2008 data 
types are exposed in Visual Studio 2008 SP1. Although 
I could see the parameter value types in an enumerated 
list, when I tried to use the Type.parse method, only the 
old types were available. Fortunately, the Visual Studio 
2008 SP1 development tools knew how to enumerate 
the new SqlDbType data types. 


Handling the Unknown 
There are many situations in which an input param- 
eter value might not be provided. In such cases, 
you have to be careful when including a value that 
essentially says “we don't know what the value is.” 
As you've seen here, it’s possible to handle NULL 
parameters in several ways. Given that many stored 
procedures have dozens of input parameters, 1t's 
handy to know how you can invoke them without 
having to set a value for each and every parameter. 
SOU 
InstantDoc ID 102592 


LISTING 4: Bypassing Parameter Creation 


.CommandType = CommandType. StoredProcedure 
If cbUseNul1.Checked = True Then 
' Do not pass the named parameter at all. 


.Parameters.Add("@DateWanted", SqlDbType.Date) 
.Parameters("@DateWanted").Value = CDate(txtDateInput. Text) 
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term “storage design” might think of closets 

and garages, not data, databases, or data ware- 
houses. However, if you try to build a data warehouse 
without paying attention to the way the storage is laid 
out, I guarantee you'll be back to the drawing board 
within a few months, trying to figure out what went 
wrong. 

Data warehouse performance is tied to the per- 
formance of the underlying storage subsystem. You 
can design a storage subsystem for maximum perfor- 
mance, or you can take the default and hope it works 
out. You're building a high-maintenance system. The 
selling point of a data warehouse is its timeliness—so 
every organizational change, every new product line 
and launch must be reflected there. You (or your staff) 
will be spending about 80 percent of your time dealing 
with the extraction, transformation, and loading (ETL) 
issues that are part and parcel of a data warehouse, so 
getting the data storage right in the beginning is going 
to save you many sleepless nights. 


E- SQL Server professionals who hear the 


Categories of Data 

Not all data is created equal, not even in a data ware- 
house. In most situations, there are three categories 
of data warehouse data: high, moderate, and low. 
While there are no hard-and-fast rules governing how 
available data should be, there’s a direct relationship 
between availability and usability. High-availability 
data is data that’s needed for current business analysis, 
such as the numbers that reflect the success or failure 
of a new product launch. Moderate-availability data 
is older data that’s used occasionally, for example, to 
compare current performance to past performance, 
such as a comparison of this year’s first quarter (Q1) 
sales results to the past five years of Q1 sales. Low- 
availability data is data with a very limited use, such as 
information about discontinued products. Once data 
slips off the radar, you need to think about whether you 
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rm 


Plan for levels of availability and speed of retrieval 


ora e for 


even need to access it. You'll also need to make deci- 
sions about long-term archiving or even disposal. 

The level of availability dictates the speed of 
retrieval. Data that needs to be readily available 
must be configured for speedy retrieval. The faster 
your retrieval requirements are, the lower the storage 
density will be. This translates to cost decisions you 
must make. You must weigh higher costs for high- 
performance storage against lower-cost high-density 
storage. It would be wonderful if you could find 
storage that’s both high-performance and high-density 
at the high-density cost, but I’m not aware that such 
an animal exists—yet. Let’s start at the beginning, with 
optimizing storage for high-availability, high-usability 
data: This is the data that reflects events such as new 
product launches. 


Using the 7D Method to Plan 
for Storage Configuration 
Determining a storage configuration for a data ware- 
house is like any other project; it’s a multistep process. 
You can adapt my trademarked 7D Method to this task. 
To learn more about the 7D Method, see “Seven Steps 
for Successful Data Warehouse Projects,” April 2009, 
InstantDoc ID 101562. Using the 7D Method steps, 
first, you must Discover how and why the data ware- 
house will be used. You'll have to balance performance 
expectations with availability and cost constraints, and 
you'll have to determine how much fault-tolerance you 
need to build into the warehouse. Second, you need to 
Design the configuration. Third, you must Develop 
the plan that will make the configuration a reality and 
choose the components. Fourth, you'll Deploy the con- 
figuration plan. Fifth, you'll manage the Day-to-Day 
operation of the data warehouse, which from a data 
storage perspective involves loading, querying, backing 
up, and restoring data. The sixth and seventh steps, 
Defend and Decommission, are beyond the scope of 
this discussion, but don’t forget them. 
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Discover the Essential Storage 
Components 

To discover how data warehouse storage will work you'll 
need to consider things such as the primary query access 
method for the data warehouse, and whether it will be 
sequential or random; how much temporary storage will 
be used and how often; how new data will be introduced 
into the warehouse(i.e., by bulk updates or the “trickle” 
method); whether new data will simply be added to 
the warehouse, or whether existing data will be edited, 
either by overwriting it or by creating an “old value” 
column in the dimension record to store the previous 
value. (For more information about these methods for 
dealing with change, see “Data Warehousing: Slowly 
Changing Dimensions,” January 2008, InstantDoc ID 
97409.) Each option adds a new condition to the pro- 
visioning picture. 

During this discovery process, you'll list the essen- 
tial physical components of the data warehouse, which 
should include the database; the data files that contain 
the actual data records and indexes that reference data 
records; the transaction log, which records additions 
and modifications to the data in sequential order, 
among other things; temporary storage (tempdb), 
which stores intermediate results and temporary tables 
during query processing; the database backup area, 
whether disk or tape; and system files, which include 
but are not limited to the OS and its swap files, the 
database management system executables, and any 
application DDLs or executables that will reside on 
the warehouse server. Storage for each of these distinct 
components can be treated differently, depending on 
how you deal with the considerations raised in the 
preceding paragraph. 

As you know, data is stored on disk, and disk 
drives fail. The very best way to protect data in the 
event of such a failure is to use a redundant array 
of independent disks (RAID). RAID technology 
distributes data across a series of disks to unite these 
physical devices into one higher-performing logical 
drive. By distributing the data, access is concurrent 
instead of sequential, yielding I/O rates faster than 
rates achieved from a single disk. There are many 
RAID configurations, but RAID 0 (striping), RAID 
1 (which includes RAID 0+1 and 1+0, all ways of 
mirroring the data), RAID 5 (independently storing 
data and parity—error correction data—separately), 
and Advanced Data Guarding (ADG), which is like 
RAID 5 on steroids, are used in the data warehouse 
arena. Because the data warehouse holds such critical 
information, you really should employ some type of 
fault-tolerant system. 

A typical data warehouse will employ two types 
of storage, DAS and SAN. There are many articles 
available that describe the pros and cons of each, but 
for our discussion it’s enough to realize that DAS com- 


monly uses the 160MB/sec SCSI protocol and is local 
to the server, while SAN uses the 2GB/sec fiber channel 
protocol and is storage shared across the network. 


Design the Storage Subsystem 
for Querying 

After the Discovery phase you'll start the Design 
phase. Of the three major data warehouse opera- 
tions—loading, querying, and backup and restore— 
query performance is generally the most critical. If, 
in fact, query performance is more important to your 
warehouse than loading or backups, then design the 
storage subsystem for query retrieval. Rarely is a data 
warehouse dedicated to a single user (which is the 
only way, from a storage subsystem perspective, that 
query performance could be sequential), so you should 
design for random access. Even if you're running que- 
ries that perform sequential retrievals, if there’s more 
than one query running at the same time, this means 
that the disk drives are seeking in order to satisfy the 
multiple read streams, and that’s random access. Plan 
to use an array of physical disks to optimize retrievals, 
and, while you're at it, use parity-based RAID (RAID 
5 or ADG) for fault tolerance. 

How much and how often will temporary storage 
be used? Temp storage is random access, typically with 
50 percent reads and 50 percent writes; plan to con- 
figure tempdb as RAID 0 or RAID 10. If you're going 
to place tempdb on DAS, then limit the number of 
drives per SCSI bus to the vendor recommendation for 
maximum performance, which is usually six or seven. 

You can eke out a few more performance points 
if you can place the database and temporary storage 
on the same set of physical drives, but this may be dif- 
ficult if your RAID controller won't let you partition 
the physical array into multiple logical arrays with dif- 
fering RAID levels. Ideally, the RAID controller would 
let you define one logical RAID 5 or ADG array for 
the database and a second logical RAID 0 or 10 for 
tempdb. If that can’t happen, plan to place tempdb on 
a separate drive, or if the data warehouse doesn’t need 
as much fault tolerance as parity (RAID 5 or ADG) 
can give it, put the database and tempdb together, 
configured as RAID 10. 

The transaction log is write-intensive and sequential; 
drive mirroring is a must. Plan to configure the transac- 
tion log as RAID 1 or 10, and put it on a drive separate 
from the data. System files should be mirrored (RAID 1 
or a variant thereof), and placed on DAS if it’s available, 
due to the faster access times. 


Design the Storage Subsystem 
for Data Loading 

Is data loading going to be a major factor in your data 
warehouse operations? How will you load the data? 
Will you use the trickle or continuous updating method, 
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which is common in synchronous or near-real-time data 
warehouses? See “Data Warehouse Workloads and Use 
Cases,” September 2008, InstantDoc ID 99653. Or will 
you use periodic bulk loading? The answer is usually 
associated with how much data latency your users can 
live with. 

In the trickle updating scenario, as the source data 
is updated, so is the data warehouse. This technique 
gives warehouse users the smallest latency delay. 
Compared to query processing, updating is a relatively 
minor event. A page read is done by a thread that 
directly services the user application; many indepen- 
dent read operations can be conducted concurrently. A 
database write, however, happens in cache; afterward, a 
database function will write the modified memory page 
to storage. The component that needs to be carefully 
configured in a trickle-load scenario is the transaction 
log. Plan to place the transaction log on a separate 
drive array from the data files, and make sure the 
transaction log is mirrored (RAID 1 or 10). 

Bulk updating using a simple recovery model (essen- 
tially, without persistent logging) and backing up after 
the bulk load completes is the norm in many, if not 
most, data warehouse operations. If you're planning to 
bulk-load the data warehouse and log the bulk opera- 


tions, expect the transaction log to be larger than the 
amount of data that you're loading—this is important 
when youre sizing the files. As with trickle-load, place 
the transaction log on a separate drive array, mirrored. 

Source data location can have an impact on storage 
subsystem performance. If the source data is coming 
in over a network and the network bandwidth is too 
limited for the data stream, then the performance 
bottleneck is the network. If possible, relocate a copy 
of the source data to a local or SAN drive, but preserve 
the sequential access by placing the source data on a 
drive array that’s not being accessed during the load 
operation. Don't let the source files share a drive with 
the transaction log, database backups, or temporary 
storage (tempdb). 

Are you presorting the data before loading it into 
the data warehouse? Do it, if you can. Otherwise, the 
database engine will be sorting it, and using either the 
destination file group or tempdb in the process. If you 
cannot presort the data, then set SORT_IN_TEMPDB 
to ON and configure tempdb as RAID 0 or 10, on its 
own array, separate from the database and the transac- 
tion log. 

When your'e bulk-loading, is data simply being 
added to the tables? If so, you can optimize for sequential 
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writes. If the data is actually being updated, as in 
the case of overwriting it or creating an “old value” 
column in the dimension record to store the previous 
value, each record will have to be located on disk before 
update, resulting in a random access model of bulk 
data loading. If this is the case, spreading the data over 
a larger number of drives may improve performance. 
And one more recommendation for this situation only: 
consider co-locating the data and tempdb on the same 
array, even assuming that tempdb will be heavily used. 
Data access will be randomized in any case, so elimi- 
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nating the call to a second array may gain you some 
performance points. 


Design for Backup and Restore 
Designing for data warehouse backup and restore boils 
down to one question: are you backing up to tape or to 
disk? If you're using a tape device, then you don’t need 
to do anything to the storage subsystem configuration. 
If you're backing up to disk, position the backup file 
on a separate drive array from any of the data ware- 
house components—the data, the transaction log, and 
tempdb. Configure the backup array as either RAID 
5 or ADG. ADG can write at approximately 20MB/ 
sec, so consider using multiple RAID controllers to 
ensure adequate throughput. 


Using the 7D Method 

When using the 7D Method I recommend that you 
spend a lot of time in the Discover and Design 
stages. Based on your findings and plans, you can 
quickly Develop and Deploy your data warehouse 
storage subsystem. 

For more information about data warehouse 
configuration, see Microsoft’s new SQL Server 2008 
Fast Track Data Warehouse home page at www 
-microsoft.com/sqlserver/2008/en/us/fasttrack.aspx. 
Here you'll find resources on how to configure the 
entire data warehouse setup, from the CPU on out, 
with capacity calculators and balanced system sug- 
gestions. As noted in “Implementing a SQL Server 
Fast Track Data Warehouse,” msdn.microsoft.com/ 
en-us/library/dd459178.aspx, Frost et al. (2009), 
“The Fast Track approach is specifically focused 
on building scalable CPU core-balanced configura- 
tions to support SSDW sequential I/O data access 
workloads.” One caveat: The authors mention mul- 
tiple times the “sequential data access workloads” 
consistent with SQL Server data warehousing. This 
concerns me because the only way to ensure true 
sequential access at the storage subsystem is to 
single-thread queries. I’m sure this is not happening 
in any version of SQL Server. I’m also not comfort- 
able with their lack of discussion on fault tolerance, 
and their lack of options when recommending levels 
of RAID protection. 

The configuration suggestions I’ve made in 
this column are for high-access data, which requires 
a high-performance storage subsystem. Not all data 
in a data warehouse falls into this category, however, 
and it’s counter-intuitive to treat all data equally. 
Next time I'll cover the low-availability data, which 
should be stored on high-density, low performance 
but cost-effective storage subsystems. I'll also cover 
long-term archiving and disposal of data that has 
outlived its usefulness. SQL 

InstantDoc ID 102609 
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Your First 


Get at your BI data more easily by 
entering the world of data cubes 


you've worked in any technology-related 
if field, you’ve probably heard the term cube 

thrown around, but most traditional DBAs 
and database developers haven’t worked with them. 
Cubes are powerful data constructs for rapidly aggre- 
gating multidimensional data. If your organization 
wants to perform data analysis on large volumes of 
data, a cube is the ideal solution. 


What Is a Cube? 

Relational databases were designed to support thou- 
sands of concurrent transactions while maintaining 
performance and data integrity. By their very design, 
relational databases fall short in large volume data 
aggregation and retrieval. To aggregate and return 
large volumes of data, a relational database must 
receive a set-based query that asks for a set of data to 
be aggregated on the fly. These relational queries are 
very costly due to their reliance on multiple joins and 
aggregations, so relational aggregation queries perform 
poorly when operating on large data sets. 

Cubes are multidimensional entities that address 
this weakness in relational databases. With a cube, you 
can provide users with a data structure that facilitates 
rapid responsiveness for large-volume aggregation 
queries. Cubes perform this aggregation magic by 
pre-aggregating data (measures) across multiple dimen- 
sions. The cube’s pre-aggregation normally takes place 
when a cube is being processed. When you process a 
cube, you're creating pre-calculated aggregations of 
data that are stored in binary form on disk. 

A cube is the central data construct of an OLAP 
system such as SQL Server Analysis Services (SSAS). 
Cubes are usually constructed from an underlying rela- 
tional database called a dimensional model, but they're 
separate entities. Logically, a cube is a data repository 
that's composed of dimensions and measures. Dimen- 
sions contain descriptive attributes and hierarchies 
while measures are the facts you're describing with 
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dimensions. Measures are combined into logical 
groupings called measure groups. You tie dimensions 
to measure groups based on a granularity attribute. 

In the file system, a cube is implemented as a series 
of related binary files. The binary architecture of a 
cube facilitates its fast retrieval of large volumes of 
multidimensional data. 

I mentioned that cubes are constructed from an 
underlying relational database called a dimensional 
model. A dimensional model contains relational tables 
(fact and dimension) that correlate nicely to a cube’s 
entities. Fact tables contain measurements such as the 
quantity of a product sold. Dimension tables store 
descriptive attributes such as product names, dates, 
and employee names. Generally, fact tables are related 
to dimension tables through primary-foreign key con- 
straints, with the foreign keys residing in the fact table. 
This relational join correlates to the cube’s granularity 
attribute. When dimension tables are directly related to 
a fact table, a star schema is formed. When dimension 
tables aren’t directly related to a fact table, a snowflake 
schema is produced. 

Note that dimensional models are categorized 
according to their scope. A data mart is a dimensional 
model designed for a single business process, such as sales 
or inventory. A data warehouse is a dimensional model 
designed to encompass multiple business processes, and 
thus facilitates cross-business process analytics. 


Be Prepared: Software 
Requirements 
Now that you have a basic understanding of what 
cubes are and why they’re important, Pll switch gears 
and take you on a step-by-step tour of building your 
first cube using SSAS. There are some basic software 
components you'll need in place before building your 
first cube, so make sure your system meets these 
requirements before proceeding. 

My sample Internet Sales cube will be built from 


Derek 
Comingore 


(dcomingore@ bivoyage.com) is a principal 
architect with Bl Voyage, a Microsoft Partner 
that specializes in business intelligence services 
and solutions. He's a SQL Server MVP and 
holds several Microsoft certifications. 
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(II 
FactInternetSales 


DimProduct 


Figure | 


Subset of the Adven- 
tureWorks Internet 
Sales data mart 


New Project 


the AdventureWorksDW 2005 sample database. PIl be 
building the sample cube from a subset of the tables 
found in the sample database that are useful for analyzing 
Internet sales data. Figure 1 shows these tables in a basic 
database diagram. Because I’m using the 2005 database, 
you can follow along with my directions using either 
SQL Server 2008 or SQL Server 2005. The Adventure- 
WorksDW 2005 sample database can be found on the 
CodePlex website at msftdbprodsamples.codeplex.com. 


| NET Framework 35 Y | | (=) 


project in Solution Explorer and select the properties 
context menu option. Select the Deployment option in 
the left side of the Property Pages dialog box and review 
the Target Server and Database settings, as shown in 
Figure 3, page 43. If you're working in a distributed 
SQL Server deployment, you'll need to update the Target 
Server property with the name of the SSAS server to 
which you intend to deploy. Click OK when you're satis- 
fied with the new SSAS project’s deployment settings. 


Defining the Data Source 
The first object you need to create is a data source. 
The data source object provides schema and data used 
when building the downstream cube-related objects. 
To create a data source object in BIDS, use the Data 
Source Wizard. Launch the Data Source Wizard by 
right-clicking the Data Sources folder in Solution 
Explorer and selecting the New Data Source option. 
(You'll find that creating SSAS objects in BIDS has 
a consistent development pattern. First, a wizard 
guides you through the object creation process and 
common settings. After the wizard finishes, you open 
the resulting SSAS object in a designer 
and fine tune as needed.) 

Once you're past the welcome 


Project types: Templates: 
Business Intelligence Projects Visual Studio installed templates 
Visual Basic ¿Analysis Services Project Himport Analysis Services 2008 Database 
ae c» J Integration Services Connections Proje. Jintegration Services Project 
isual C++ Report Server Project Wizard Report Model Project 
Database Projects 3 R S P t 
Other Project Types AO 
Test Projects My Templates 
search Online Templates... 


screen, define a new data connection 
by clicking the New button. Create 
a new Native OLEDB\SQL Server 
Native Client 10 connection pointing 
to your designated SQL Server (which 
hosts the sample database). You can 


Create a new Analysis Services project 


use either Windows or SQL Server 


Name: 
Location: 


Solution Name: 


SQLMAG_MyFirstCube 


SQLMAG_MyFirstCube 


C:\Users\Derek\Documents\Visual Studio 2008\projects 


authentication, depending on your 
SQL Server environment. Click the 
Test Connection button to make sure 
you've defined a valid database connec- 
tion, then OK. 


Figure 2 


The BIDS New Project 
dialog box 
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As mentioned above, you need access to an instance 
of SQL Server 2008 or 2005, including the SSAS and 
Business Intelligence Development Studio (BIDS) 
components. Pll be using SQL Server 2008, so you 
might see a few subtle differences if you’re using SQL 
Server 2005. 


Creating the SSAS Project 
The first thing you need to do is create an SSAS project 
using BIDS. To open BIDS with the default splash 
screen, go to Start, Microsoft SQL Server 2008, SQL 
Server Business Intelligence Development Studio. 
You create a new SSAS project by selecting File, New, 
Project. You'll see the New Project dialog box, shown 
in Figure 2. Next, click the Analysis Services Project 
icon and name the project SOLMAG_MyFirstCube. 
Click OK. 

Once the project has been created, right-click the 


Next is the impersonation infor- 
mation configuration, which, like the data connection, 
depends on how your SQL Server environment is con- 
figured. Impersonation is the security context SSAS 
relies on when processing its objects. If you're running 
a basic, single-server (or laptop) deployment, as I 
assume most readers are, you can simply select the Use 
the service account option, as shown in Figure 4, page 
43. Click Next to complete the Data Source Wizard, 
and use AWDW2005 for the data source’s name. It’s 
fine to use this method for testing, but in production 
environments it’s not a best practice to use the service 
account for impersonation. It's a better idea to des- 
ignate domain accounts for the SSAS impersonation 
account. 


Data Source View 


With your data source defined, the next step in the 
process of building an SSAS cube is to create a 
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Left-Brain.com is the newly launched online superstore stocked with 
educational, training, and career-development materials focused on 
meeting the needs of SQL Server professionals like you. 


Featured Product: 

SQL Server 2008 System Views Poster 

Face the migration learning curve head on with the SQL Server 
2008 System Views poster. An updated full-color print diagram of 
catalog views, dynamic management views, tables, and objects for 
SOL Server 2008 (including relationship types and object scope), 
this poster is a must-have for every SQL DBA migrating to or al- 
ready working with SQL Server 2008. 
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SQL Server 2008 System Views 


Order your full-size, print copy 
today for only $14.95*! 


*Plus shipping and applicable tax. 


www.left-brain.com SOLES 


Data Source View (DSV). A DSV is [sormag_MyFirstCube Property Pages Om 
helpful because it provides the capa- 
bility to separate the schema your cube Configuration: [Active(Development) y) Platform: [N/A Configuration Manager... 


is expecting from that of the underlying Configuration Properties [El Options 
database's schema. As a result, DSVs basii peaa opn : ce 
can be used to enhance the underlying E el di q OR 
relational schema for the purposes of E Target 
building a cube. Some of the DSV's EX localhost 
key capabilities for enhancing source Detabaxe Se ea a 
schemas include named queries, logical 
table relationships, and named column 
calculations. 
Go ahead and right-click the DSV 
folder and select the New Data Source | Server 
View option to bring up the New Data | The Analysis Services instance to which the project will be deployed. 


Source View Wizard. In the Select a 
Data Source step, select the relational 


Anni 
Apply 


database connection you defined earlier 
and click Next. Select the FactInternetSales, Dim- 
Product, DimTime, and DimCustomer tables and click 
the single arrow pointing to the right to move the tables 
over to the included column. Finally, click Next and 
finish the wizard, using the default name. 

At this point you should have a DSV, which 
is located under the Data Source Views folder in 
Solution Explorer. Double-click the new DSV to dis- 
play the DSV Designer. You should see all four tables 
in the DSV, as shown in Figure 5, page 44. 


Creating the Database 
Dimensions 

As I explained in the introduction, dimensions provide 
the descriptive attributes of measures and hierarchies 
that are used to provide non-leaf level aggregations. You 
should understand the difference between a database 
dimension and a cube dimension: database dimensions 
provide a base dimension object for multiple cube 
dimensions to be built upon. 

Database and cube dimensions provide an elegant 
solution to a concept known as role-playing dimensions. 
Role-playing dimensions are when you need to use a 
single dimension multiple times in a cube. Date is a great 
example—in the sample cube, you'll be building a single 
date dimension and referencing it once for each date for 
which you want to analyze Internet sales. 

Date will be the first dimension you create. 
Right-click the Dimensions folder in Solution Explorer 
and select the New Dimension option to launch the 
Dimension Wizard. Select the Use an existing table 
option and click Next in the Select Creation Method 
step. In the Specify Source Information step, select 
DimTime in the Main table drop-down and click the 
Next button. Now you need to create the Time dimen- 
sions attributes in the Select Dimension Attributes step. 
Select every column, as shown in Figure 6, page 44. 

Click Next. In the Completing the Wizard step, type 
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Dim Date in the Name text box, and click the Finish 
button to complete the Dimension Wizard. You should 
now see a new Dim Date dimension located under the 
Dimensions folder in Solution Explorer. 

Next you'll use the Dimension Wizard to create the 
Product and Customer dimensions. Use the same steps 
to create a basic dimension that you used before. When 
you're running through the Dimension Wizard, make sure 
you select all potential attributes in the Select Dimension 
Attributes step for both dimensions. The default values 
for all other settings will suffice for the sample cube. 


Figure 3 


The Target Server and 
Database settings 


o Data Source Wizard 


Impersonation Information 
You can define what Windows credentials Analysis Services will use to connect to the data 
source, 


© Use a specific Windows user name and password 


(9) Use the service account 
Use the credentials of the current user 


> Inherit 


Next > 


Figure 4 


Bringing It All Together: 
Building the Internet Sales 
Cube 

With your database dimensions built, you're now ready 
to build the cube. In Solution Explorer, right-click 
the Cubes folder and select the New Cube option to 
launch the Cube Wizard. Select the Use existing tables 
option in the Select Creation Method screen. Select 


Selecting the Use the 
service account option 
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Select Dimension Attributes 
Specify dimension attributes and select Enable Browsing to surface them as hierarchies. 


Available attributes: 
| Attribute Name [V] Enable Browsing Attribute Type 
] Time Key Y) Regular 
|. Full Date Alternate Key 
[Y] Day Number Of Week 
[Y] English Day Name Of Week 
[7] Spanish Day Name Of Week 
[Y] French Day Name Of Week 
[7) Day Number Of Month 
[7] Day Number Of Year 
] Week Number Of Year 
English Month Name 
[7] Spanish Month Name 


Regular 
Regular 
Regular 
Regular 
Regular 


SESS888 


Regular 


<) 


Regular 
Regular 
Regular 
Regular 
[Y] French Month Name 

[Y] Month Number Of Year 


| Calendar Quarter 


Regular 


(El 
Y 
[Ea] 
El] 
Y) 


Regular 
Regular 
[Y] Calendar Year Regular 
[7] Calendar Semester Regular 
| Fiscal Quarter Regular 


] Fiscal Year Regular 


SSS8S8S8 


[Y] Fiscal Semester Regular 


Figure 6 


Selecting the Time dimension attributes 


Figure 5 the FactInternetSales table for the Measure Group target SSAS server. When you deploy a cube, you're 
DSV Designer in the Select Measure Group Tables step. Remove the technically sending XML for Analysis to the target 
check next to the Promotion Key, Currency Key, Sales SSAS server, which creates the cube on the server. As I 
Territory Key, and Revision Number measures in the mentioned earlier, processing a cube populates its 
Select Measures step and click Next. binary files on disk with data from the underlying 
On the Select Existing Dimensions screen, make data source, including the additional dimensional 
sure all existing database dimensions are checked to metadata you've added (dimension, measure, and 
reuse them in the cube as cube dimensions. Because I cube settings). 

want to keep this cube as simple as possible, uncheck Once the deployment process is complete, a new 
the FactInternetSales dimension in the Select New Process Cube dialog box is displayed. Click the Run 
Dimensions step. (By leaving the FactInternetSales button to process the cube, and a Process Progress 
dimension checked here, you’d be creating what's dialog box will be displayed. Once cube processing has 
called a Fact dimension or degenerate dimension. Fact completed, click the Close button (twice to close out 
dimensions are dimensions that are created by using both dialog boxes) to complete the cube's deployment 

an underlying fact table as opposed to a traditional and processing. 
dimension table.) You've now built, deployed, and processed 
Click the Next button to advance the wizard to the your first cube. You can browse your new cube by 
Completing the Wizard step and type “My First Cube” right-clicking the cube in Solution Explorer and 
in the cube name text box. Click Finish to complete the clicking Browse. Drag and drop the measures in the 


Cube Wizard process. center of the pivot table and the dimension attributes 

on the rows and columns to explore your new cube. 
Deploying and Processing Observe how fast the cube returns your various 
the Cube aggregation queries and think back to my earlier 


You're now ready to deploy and process your first discussion about the weaknesses of relational database 
cube. Right-click the new cube in Solution Explorer aggregation queries. You should now comprehend the 
and select the Process option. You'll see a dialog box raw power, and thus the business value, of an OLAP 
informing you that the server content appears to be cube. SL} 
out of date. Click Yes to deploy your new cube to the InstantDoc ID 102930 
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DBCC 


Very 


BCC CHECKDB is the T-SQL command 
D that checks the logical and physical integrity 
of all the objects in a specified database. 
Most DBAs probably don’t think twice about running 
DBCC CHECKDB regularly—until their databases 
start to get very large. As the size of your database 
increases, you'll encounter various challenges in run- 
ning DBCC CHECKDB. For example, the time it 
takes to complete a full DBCC CHECKDB process 
might become prohibitive. In addition, there might not 
be enough data space for the snapshot created during 
the DBCC CHECK DB process. 

Besides the challenges of dealing with very large 
databases (VLDBs), your job must also be intelligent 
enough to recognize new databases, dropped databases, 
and databases that are offline or otherwise unavailable, 
such as a mirrored database. To deal with these prob- 
lems, I created a simple solution using what I call the 
Admin/Worker Job concept. In the following sections, 
I discuss the Admin Job and the Worker Job, and I 
explain how the @VLDB parameter functions. All the 
scripts in this article will run on both SQL Server 2008 
and SQL Server 2005. 


The Admin Job 

Web Listing 1 (www.sqlmag.com, InstantDoc ID 
102873) contains a script called ServerDailyMainte- 
nance.txt. Running this script creates a SQL Server 
Agent job—the Admin Job. The Admin Job is the only 
job that is actually scheduled to run; it creates/updates 
and starts the Worker Job. 

Figure 1 shows the Admin Job's main step, 
which is to run msdb.dbo.mnt_DBCC. Web Listing 2 
contains the mnt_DBCC stored procedure; this 
stored procedure identifies the available databases 
and begins to construct the Worker Job called 
Maintenance DBCC_CHECKDB. 

Figure 2, page 48, lists mnt_DBCC’s parameters 
and their acceptable values, including what each value 
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CHECKDEB... 


Databases 
Use the Admin/Worker Job approach 


is used for. For system-only databases (model and 
master), the @system_only parameter should be 1. To 
perform DBCC CHECKDB with the physical_only 
option, pass 1 to the @physical_only parameter. For 
most databases, you'll keep 0 values for the @system_ 
only and @physical_only parameters. For VLDBs, 
you might want to pass | to the @VLDB parameter. 
If you pass 1 to @VLDB, then you must also pass a 
value for the @days parameter. The @days parameter 
is ignored if @VLDB is 0. A value of 0 for @VLDB 
means the regular DBCC CHECKDB command will 
be executed. 


ORE on the WEB 


Download the listings at 
InstantDoc ID 102873. 


= 


A Sa A á Wv 
David Paul 
Giroux 


(davigi @ microsoft.com) is a Microsoft DBA, 
supporting SQL Server for the Xbox Live and 
Zune online services. He's an MCITP: Database 
Administrator and MCITP: Database Developer 
for SQL Server 2008/2005 and an MCDBA for 
SQL Server 2000. 


Step name: 


Start DBCC CHECKDB 


Type: 
Transact-SQL script (T-SQL) 


Run as: 


GO 
(ew) WAITFOR DELAY '00:00:10' 
sa 


EXEC msdb.dbo.sp_start_job 
Parse 


Figure | 


Running the Admin Job for server daily maintenance 


@job_name = N'Maintenance_DBCC_CHECKDB’ 
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mnt_DBCC 
A. Syntax 
mnt_DBCC 


[ @days=] days 


B. Arguments 


[ @VLDB =] VLDB 


[ @days=] days 


c. Remarks 


Figure 2 


Creates SQL Server Agent Job: Maintenance_DBCC_CHECKDB. 


{ @system_only =] system_only, 
[ @physical_only =] physical_only, 
{ @VLDB =] VLDB, 


[ @system_only =] system_only 
If set to 1, DBCC CHECKDB will be performed on master and msdb only. system_only is bit with a default value of 0. 


[ @physical_only =] physical_only 
If set to 1, DBCC CHECKDB will be performed with the PHYSICAL_ONLY argument. physical_only is bit with a default value of 0. 


mnt_DBCC parameters and values 


B. Arguments 
[ @days =] days 


[ @db =] ‘db’ 


[ @version =] version 


GQ Remarks 
The procedure calculates the size of each table and then spreads the weight into a number of groups based on @days. The result set includes DBCC CHECKALOC and DBCC 
CHECKCATALOG commands and a set of tables for OBCC CHECKTABLE. 
mnt_DBCC_VLDB must be run from the msdb database. mnt_DBCC_VLDB is required for mnt_DBCC if @VLDB = 1 is passed to mnt_DBCC. 


Figure 3 


mnt_DBCC_VLDB 
A. Syntax 
mnt_DBCC_VLDB 
[ @days =] days, 
[ @db =] ‘db’, 


[ @version =] version, 
[ @results =] ‘results’ OUTPUT 


If set to 1, then the components of DBCC CHECKDB will be broken out. DBCC CHECKALLOC and DBCC CHECKCATALOG will still be performed daily; however, the number 
of tables to perform DBCC CHECKTABLE on will vary based on the value passed to @days. If set to 1, @physical_only is ignored. VLDB is bit with a default value of 0. 


Is the number of days to spread the load for DBCC CHECKTABLE. days is only used with VLDB and is otherwise ignored. days is tinyint with a default value of 7. 


mnt_DBCC must be run from the msdb database. Certain databases are excluded, @VLDB overrides @physical_only. mnt_DBCC will work for SQL Server 2005 or 2008 only. 


Used to create a set of DBCC commands to be used by mnt_DBCC, 


The value determines the number of groups to separate tables into for the DBCC CHECKTABLE command. The weight of each table is spread as evenly as possible 
amongst the groups. days is int with a default value of 7. 


The database to create the commands DBCC for. db is sysname, with no default. 


The SQL Server version which is passed from mnt_DBCC. version is smallint, with a default value of 2008. 


[ @results =] ‘results’ OUTPUT 
The result set if the stored procedure runs successfully. Results is an output variable of the type varchar(MAX), with a default of NULL. 


mnt_DBCC_VLDB parameters and values 
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If you set the mnt_DBCC stored procedure’s 
@VLDB parameter to 1, mnt_DBCC will call the 
mnt_DBCC_VLDB stored procedure, which Web 
Listing 3 contains. Figure 3 lists mnt_DBCC_VLDB’s 
parameters and values. 


The Worker Job 

The Worker Job that the Admin Job creates and starts 
is called Maintenance DBCC_CHECKDB. Every 
Worker Job step has a subsequent error-checking 
step. Figure 4 shows a sample step from the Worker 
Job for the AdventureWorks database, where the 
@VLDB parameter is set to 1 and the @days param- 
eter is set to 7. 


Using the Admin/Worker Job concept means every 
Worker Job is dynamic, because it's modified nightly. 
You won't lose any job history for the Worker Job, 
because the job is updated rather than being dropped 
and re-created each time. 


How Does the @VLDB 
Parameter Work? 

If the @VLDB parameter is set to 0, the Worker 
Job will run the simple DBCC CHECKDB process. 
The magic happens when the @VLDB parameter is 
set to 1. (You have to test to determine the number 
of days to set for the @days parameter in your 
environment.) 
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When the @VLDB parameter is 


Step name: 
set to 1, mnt_DBCC makes a callto | pgcc CHECKDB Adventure Works 
mnt_DBCC_VLDB, which audits B 
ype: 


all of the user tables, system tables, 
indexed views, and internal tables 
in the database and determines the 
size of each. The tables are then 
separated into a number of groups 
equal to the number passed for the 
@days parameter. The tables are 
spread as evenly as possible in each 
group, so the load on the server 
should be the same each night. 
Each group of tables is placed in a 
numbered group (VLDB_GROUP) 
based on the @days value. The 
numbered group to run on any 
given day is based on the formula 


Run as: 


VLDB_Group = DATEDIFF(dd, N'Ø1- 
01-2009', GETDATE()) % @days 


This formula means the subse- 
quent group will always run, no 
matter when the job started. The 
job always knows which groups ran 
and which is next to run without 
needing to store data in a table somewhere. 

To illustrate how the @VLDB parameter works, 
let’s consider an example. Suppose you have 10 tables, 
and you pass 2 for the @days parameter. The stored 
procedure locates the tables and calculates their size, 
putting the largest table in VLDB_Group 0, the next 
largest in VLDB_Group 1, then back to VLDB_Group 
0, and so on until all the tables are grouped. 

If you run 


SELECT DATEDIFF(dd, N'Ø1-Ø3-2ØØ9', GETDATE()) % 2 


the result will either be 0 or 1. If today is 0, tomorrow 
will be 1, the next day 0, and so on. This is how each 
group of tables will run without the job storing infor- 
mation about which tables are in each group. 

But what happens if the second-largest table on 
day 1 becomes the largest table on day 2? The table 
will be missed because it shifted to group 1. What if 
table 1 and table 2 both grow or shrink but the rela- 
tive sizes stay the same? In that case, the tables won’t 
be missed. In most cases, the largest table will always 
be the largest table and the second-largest table will 
always be the second-largest table; no tables will ever 
be missed because the relative sizes will stay the same. 
If you want a guarantee that no tables will ever be 
missed, you'll have to modify the code to store data in 
a table somewhere. Save the groups on day 1 and refer 
to the table throughout the cycle. In addition, you must 
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Transact-SQL script (T-SQL) 


ST 
prre ee 


IF DB_ID(Adventure Works’) IS NULL 
BEGIN 


RETURN 
END 


METE = CHECKALLOC ([AdventureWorks]) WITH NO_INFOMSGS; 
DBCC CHECKCATALOG ([Adventure Works]) WITH NO_INFOMSGS; 


GO 
Copy DBCC CHECKTABLE ([AdventureWorks.Person.CountryRegion]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.Production.Document]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([Adventure Works. Production.Location]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.Production.ProductListPriceHistory]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.Production.ProductModel]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ({Adventure Works. Production. WorkOrder}) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.Purchasing. Product Vendor]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([Adventure Works. Sales.ContactCreditCard]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([Adventure Works.Sales.CustomerAddress]) WITH NO_INFOMSGS: 
DBCC CHECKTABLE ([AdventureWorks. Sales. Individual]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.Sales.Sales TaxRate]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.Sales.SpecialOffer]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.sys.sysbinobjs]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.sys.syscerts]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.sys.sysprivs]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.sys.sysremsvcbinds]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.sys.sysscalartypes]) WITH NO_INFOMSGS; 
DBCC CHECKTABLE ([AdventureWorks.sys.syssqlguides]) WITH NO_INFOMSGS; 


verify that no tables were dropped or added since the 
calculation on day 1. Growth isn’t an issue (e.g., if your 
largest table today becomes your second-largest table 
tomorrow). 

When the @VLDB parameter is set to 1, it also 
creates a Worker Job that will run DBCC CHECK- 
ALLOC and DBCC CHECKCATALOG every time, 
as Figure 4 shows. 


Put It to Use 
The DBCC CHECKDB job that uses the Admin/ 
Worker Job method is both intelligent and main- 
tenance-free, and you 
can run it on any size 
database. The job auto- 
matically runs against 
databases you add and 
removes databases from 
the job that are dropped. 
In addition, the job can 
identify whether a data- 
base is mirrored or is oth- 
erwise offline. You can 
specify whether to run the job against only system 
databases or only physical databases. One of the 
best features is that you can spread the DBCC 
CHECKDB load over any number of days that 
you specify. SQL] 
InstantDoc ID 102873 


Figure 4 


Sample step from 
Worker Job: 
Maintenance _DBCC_ 
CHECKDB 


As the size of your 
database increases, 
you'll encounter various 


challenges in running 
DBCC CHECKDB. 
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The SQL Server Community: 
A Welcoming Place 
eat, sleep, and drink SQL Server. It helps me pay the 
bills, keeps me challenged, and never disappoints 
my thirst for knowledge. And while there is plenty 
of focus on extolling SQL Server's technical merits, 
I wanted to address a couple of unsung benefits of 
SQL Server. 


Benefits in Perspective 

People tend to be passionate about their databases. 
Oracle, MySQL, and Postgress all have zealots who 
are just as passionate about their platforms as I am 
about SQL Server. As such, I’m not writing about SQL 
Server’s unsung benefits in the sense that they don’t 
exist elsewhere—or with other DB platforms. If they 
do, great. But I could care less. 

Instead, Pm writing about things that I really love 
about SQL Server. Things that make it a pleasure to 
work with. Things that really resonate with my per- 
sonal philosophies of always striving to be open, share 
knowledge, act professionally, and trying to never be 
stingy or selfish. 


Benefit 1: User Community 

Most people don’t know this, but my background is in 
Near Eastern Studies. In college I studied Semitic cul- 
tures and traditions, the Holocaust (or Shoah), Arabic 
and Islam, ancient scriptures, and even Mongolian 
interactions in the Middle East during the period of the 
crusades. My degree didn’t do a whole lot to prepare 
me for a career change into development and database 
administration. 

Happily though, with some hard work, serious 
desire, and the help of some truly selfless people who 
volunteered time in forums and SQL Server news- 
groups to answer sometimes dumb questions on web 
development and SQL Server administration, I was 
able to not only cope but thrive and succeed. 

Early on in my career, I also did a stint with PHP 
and MySQL. And while I enjoyed aspects of both of 
those technologies, asking stupid questions in forums 
dedicated to those technologies seldom met with the 
same kind of patient, helpful responses, that I grew 
to love in SQL Server and ASP.NET forums. I hope a 
lot has changed in the many years since I used those 
technologies, but my perception of many of the forums 
for PHP and MySQL was that they were more of a 
place for users to show off their prowess and skill. Or, 
in other words, they were filled with lots of arrogant, 
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self-righteous condescension that frequently resulted 
in mocking people who didn’t know the answers to 
everything. And I’m not speaking from some place of 
personal trauma, because I was always too afraid to 
ask questions, given some of the responses I saw. 

With SQL Server and ASP.NET groups, Pve noticed 
an entirely different approach and spirit. Mocking or 
ridiculing someone in the ASP.NET or SQL Server 
forums that I’ve participated in is not only rare, but 
offenders typically end up getting called out on the 
carpet by their peers, even in cases when no offense 
was meant, because there’s a culture that encourages 
newcomers and resists any air of superiority. 

Even better is the fact that this same culture and 
spirit of helpfulness seems to translate across mediums. 
For example, I could ask a question on Twitter today 
about a specific SQL problem or what the name of a 
certain command is and I’d typically have answers and/ 
or insights within just a few minutes. More importantly, 
virtually no one from within the SQL Server commu- 
nity would call me an idiot, tell me to switch jobs, or 
quit wasting everyone’s time. 

In fact, Pd wager that a few people reading this 
post have no idea what “RTFM n00b!” means. And, 
sadly, that’s just not the case with all other communi- 
ties. But this cultural strength is a real gem that often 
goes unsung when it comes to interacting with the SQL 
Server community. 


Benefit 2: Organizational 
Community 

The great culture that persists among individuals 
within the SQL Server community appears to run 
deep—organizations of SQL Server users tend to share 
those same sentiments in aggregate. In other words, not 
only do individual SQL Server users enjoy a strong 
sense of community and strive to be truly helpful 
instead of selfish and stingy, but I think there’s a 
definite trend among organized groups of SQL Server 
users to behave the same way. 

Organizations that have been around for awhile, such 
as PASS and SSWUG (which facilitate user groups and 
skills acquisition), along with SQL Server Magazine 
(written for and by SQL Server users with an online and 
print presence), don’t tend to really compete with each 
other. The same goes for newer arrivals on the scene, such 
as blog aggregators and the SQL Server wikis. Being a 
member or participant in any one of these organizations 
or mediums isn’t like being in a biker gang; it doesn’t pre- 
clude you from participating in another. And it wont get 
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you beat up. Instead, the more organizations or mediums 
you subscribe to or participate in, the more you benefit 
as a user and the more you can interact with peers and 


learn. 


Although all of these organizations definitely have 
keen interest in developing user audience and atten- 
tion, the unspoken competition between these groups 
is invisible to most SQL Server users and rarely gets 


In discussing the strengths 
of the SQL Server 
community, | would 

be remiss in not 
acknowledging 
Microsoft’s influence 

and help in making 

the community as 
successful as it is. 


ugly. In fact, you’ll com- 
monly see these groups 
selflessly promote each 
other in order to ensure 
that members have the 
greatest possible access to 
the skills, tools, instruc- 
tion, and guidance that 
they need to successfully 
complete their jobs. To 
me, that’s professionalism 
that you won't find else- 
where. The primary goal 
of all of these organiza- 
tions and mediums is to 
give DBAs, developers, 
and IT pros the tools, 
knowledge, skills, and 


support they need. In other words, it’s almost like 
there’s an unspoken code that promotes civility and 
really places service above self interest. I don't know 
if that happens in other communities. And I don’t 
care. I only know that I really appreciate that aspect 
of using SQL Server, and I think it goes unsung far 


too often. 


Benefit 3: Vendor Community 
Asa DBA and database developer, I’ve had the chance 
to work with a wide variety of tools from various ven- 
dors providing solutions for SQL Server. I’ve also had 
the chance to review a number of different products 
and solutions for reviews in SQL Server Magazine. 
And, as a consultant, I’ve also had the chance to 
evaluate tools and solutions for clients and customers. 
Pve observed how all of the major vendors of SQL 
Server solutions—such as Quest Software, Idera, and 
Red Gate Software—have embraced the idea of honestly 
trying to educate and empower their customers and SQL 
Server users in general. And while there’s no doubt that 
they eventually hope to turn that rapport into a sale, 
there’s a definite dignity, professionalism, and openness 
about how they seek to find new customers that I really 
love. In fact, I'd say that they’ve all figured out that if they 
can help make their customers succeed, they’re helping 
further an environment where their tools and solutions 
can continue to be marketed. But what’s great is how they 
strive to build that success for their potential customers. 
Ultimately, I don’t know that you could get that 
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kind of feeling anywhere else. Hopefully 1t exists else- 
where. All I know is that it exists within the SQL Server 
vendor community and it’s another huge, unsung 
benefit that comes with using SQL Server. 


Microsoft’s Role 

Tm not afraid to take Microsoft to task when needed. 
But it’s only fair that I hand out kudos where they're 
deserved as well. To that end, in discussing the 
strengths of the SQL Server community, I would be 
remiss in not acknowledging Microsoft’s influence and 
help in making the community as successful as it is. 

At the individual level, I don’t think that the MVP 
program is the sole source of what makes SQL Server 
users (or .NET developers) so willing to help each 
other. But there’s no doubt that the MVP program 
recognizes selfless service and helps keep it alive. 
Microsoft's recognition of this aspect of community 
involvement says tons about the kind of community 
and culture they want to engender around the con- 
tinued sale and support of their products. 

Microsoft has also done a great job in championing 
and even helping to fund and support user groups 
and professional organizations aimed at empowering 
users of their products. Obviously, that makes great 
business sense for them, and it appears to pay off well. 
But Microsoft also puts a lot of tangible effort into its 
products’ user communities and that’s something you 
don’t see everywhere. 

Finally, I wonder if Microsoft doesn’t have some 
roundabout role to play in creating the sense of com- 
munity fostered by vendors. Much of this is defined by 
the caliber of folks working at these companies, many 
of whom have carried SQL Server community ties and 
culture into their organizations. But competing with 
Microsoft can be hard, even when we're just talking 
about filling in the gaps with products and services that 
Microsoft has left off. Consequently, many businesses 
that work to fill these gaps have adopted a culture of 
“embrace and extend,” at least nominally. But that’s still 
a much different mindset than “seek and destroy,” and I 
wouldn't be surprised to see some form of link between 
that reality and the great way in which vendors within 
the SQL Server community comport themselves. 


I Love This Community 
Regardless of the sources of these benefits, they 
frequently go unsung. Yet they're things that make 
working with SQL Server better, easier, and more cost 
effective. They’re also paradigms that resonate deeply 
with my own sense that when you do right by everyone 
(even when it sometimes seems hard or scary), karma 
takes care of you in the end. Accordingly, I love being a 
part of the SQL Server community in every way, shape, 
and form. [SQL] 
—Michael K. Campbell 
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Business Intelligence for DBAs 
j was reading Michael Otey’s article, “Getting Started 
with BI” (InstantDoc ID 102642), and it compelled 
me to write a follow up. This article will explore a brief 
tour of what business intelligence (BI) is and why is 
it so valuable for companies to implement, plus how 
traditional Microsoft BI solutions are implemented. 


What BI Can Do 

Today, most organizations have some form of reporting 
or analytics. A “normal” company will have reports 
running directly off of various OLTP databases. These 
reports produce useful business measures for that one 
operational system. In another area of the company, 
similar reports are built for a different operational 
system. And there might even be some form of a 
corporate intranet that hosts one or both sets of these 
reports for thin-client viewing. Spreadsheets continue 
to dominate the “normal” company as they are flung 
in-between meetings with various edits being made to 
their metrics. The CEO might even receive two different 
reports for the same metric having different results. The 
situation I am describing is “par for the course” at most 
companies still to this day. 

Now, can you imagine working in a company 
where information is consistent, reliable, and flows 
throughout? An organization that ensures executives 
are constantly aware of how their business is per- 
forming, where analysts explore free-flowing models 
to locate hidden trends, and where operations has 
constant visibility for their daily decisions wrapped 
with seamless collaboration? What I am describing 
to you is a 21st century Intelligent Enterprise. Such a 
company is viable through business intelligence. 


From Chaos to Order 

So why is it hard to obtain this Intelligent Enterprise 
model in today’s world? As Michael Otey said, cost and 
scope are the big reasons why many BI projects never 
get going. Most BI industry experts will tell you only 20 
percent of companies have implemented BI solutions 
and thus the “normal” scenario I described above still 
prevails. However, looking forward, more and more 
rapid BI solutions are becoming available. 

Rapid BI is enabled by any combination of tools, 
technologies, and processes. Agile software development 
practices are beginning to be leveraged for BI solution 
development. The Kimball methodology of designing 
and delivering individual (yet interconnected) data marts 
enables much quicker ROI for enterprises. The cloud (e.g., 
SQL Azure) will enable small data marts in the cloud. 
Microsoft’s Fast Track Data Warehouses provide a reus- 
able hardware reference framework to remove the burden 
of hardware architecture design for enterprise data 
warehouse (DW)/BI solutions. Finally, any premier BI 
professional services firm should be more than willing 
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to provide a pilot project for your company. BI pilot 
projects usually translate into the creation of a test data 
mart (but not always). BI pilot projects should last no 
longer than six weeks, 


Other Issues 

There are a few more Intelligent Enterprise topics I 
would like to discuss before we dive into the world of 
DW/BI terminology and how those terms and entities 
map to an end-to-end, elegant Microsoft BI solution. 


Organizational Culture 
While technology is very useful and powerful, it only 
solves the enablement piece of the Intelligent Enter- 
prise puzzle. Organizations must commit to change 
to grow their business through strategic corporate IT 
initiatives. Becoming an Intelligent Enterprise does not 
end when a data mart and some reports are completed. 
Becoming an Intelligent Enterprise means committing 
to a cultural change where the company empowers 
workers with knowledge. Furthermore, you commit the 
organization to continuous intelligence improvement 
by gaining more insight into your business, which will 
result in enriched analytics (and thus business value). 
I cannot emphasize the commitment of the busi- 
ness side for BI projects enough. While I all too well 
understand that BI is implemented in technology, it is 
100 percent business driven! The BI projects I’ve wit- 
nessed having the most success started with someone 
outside of IT (CEO usually) who became aware of the 
power and value of adopting BI and drove the “BI 
stake” into the corporate ground hard. Incidentally, 
those enterprises who lead BI by the business side are 
now benefiting heavily from their resulting solutions 
still to this day as a result. They have a significant 
competitive advantage as a result. 


Rapid BI & Project Gemini 

Gemini will bring about a new era of Rapid BI by 
taking some (not all, not near all) of the development 
workload out of IT’s hands. So yes, Gemini definitely 
qualifies as a Rapid BI platform. There are two very 
different yet strongly correlated worlds of BI: Tradi- 
tional and Self-Service. 

Traditional BI is what I and others like me do for a 
living. When a company wants “BI” we step in, gather 
requirements from key stakeholders, and start the 
design sessions (which translate into many technical 
artifacts thereafter). We are building the “Corporate 
Truth” for key business processes. 

Self-Service BI complements Traditional BI by 
allowing Information Workers to build their own ana- 
lytical models (most likely sourcing some of its data 
from a data warehouse). How does Self-Service BI 
complement Traditional BI? Let's say Sue in marketing 
creates a new self-service analytical model. Sue publishes 
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the new model to a collaboration server. 
Sue’s colleagues really like the analytics 
she has created. As a result, the business 
now takes this model to “us” (the BI 
experts) and asks us to recreate this model 
in a Traditional BI capacity. Voilà! I could 
be wrong but from what the experts at 
Redmond are saying, this synergy of Self- 
Service & Traditional BI is where more 
value will come. 


> b S t AC as 

As a new BI entrepreneur, making quite 

a few blunders (and learning along the 

way), I try my best to keep my finger 

on the BI pulse. There are two major 
reasons why companies that want BI do 
not further engage: 

° Traditional BI engagements are big, 
bet-the-business endeavors. ROI 
cannot be realized quickly as a result. 

e There is a lack of available BI expertise. 


Rapid BI solutions help a company 
overcome the first objection cited above. 
Regarding the second objection, com- 
panies that invest in BI produce more 
BI experts as a result (chicken and the 
egg). Support the DBA who wants to 
do BI in developing a small data mart 
to play with. Engage with a consulting 
company for a pilot BI project. These 
are small, incremental steps that can 
produce initial Bl-based value. 


There is no way I can explain every 
detailed BI concept in a book, let alone 
a blog post. One of the challenges 
associated with building and delivering 
BI solutions is the sheer volume of 
available architectures, technologies, 
and processes. With every BI project I 
participate in, I learn something new. 
So my first message is: don’t become so 
overwhelmed by the forest that you miss 
the trees. No one is an expert in every BI 
technology or platform. But you can be 
an expert in selective areas and maintain 
a good understanding of the others. 

A good way to begin learning Bl is to 
comprehend the common architectures 
(how each tier works with another) and 
dive into the areas you wish to master. 

The process starts with the various 
OLTP databases the business wishes to 


collect data from (visit www.sqlmag.com, InstantDoc 
ID 102943 to see a sample BI infrastructure). These tar- 
geted data sources provide us with the copper we will 
transform into gold. We use Extraction, Transformation, 
and Loading (ETL) processes to collect and mold the 
data into a format that is useful for denormalized data- 
base schemas (data marts & warehouses). 

There is an optional layer called the Operational 
Data Store (ODS). The ODS is a normalized solution 
to enable right-time analytics. If you do employ an 
ODS, your ETLs must then move and transform the 
data into the downstream Data Mart & Warehouse 
thereafter. 

The data mart (or warehouse) tier is simply denor- 
malized schemas that are useful for reporting and 
analytical consumption purposes. These relational 
databases are designed to combine third normal form 
tables into star and snowflake schemas. Dimension 
tables contain descriptive attributes and hierarchies 
while fact tables contain the actual measurements. 

The OLAP server contains cubes that are useful 
for performing Fast Analysis of Shared Multi-Dimen- 
sional Information (FASMI). With a cube in place, a 
business analyst can slice-and-dice analytics to gain a 
better understanding of how (and why) the business 
operates in the capacity it does. Technically, there is a 
second major component of the analytics server called 
data mining. Data mining is, unfortunately, a seldom- 
used BI component. For now, just understand that data 
mining is an optional BI component that is employed 
for predictive analytics. 

Finally we have the consumption layer. BI con- 
sumption is where the forest grows quite large. Col- 
laboration and analytical servers are usually deployed 
in-between the core BI infrastructure and the clients in 
the consumption layer. These servers provide the client 
applications with additional functionality beyond the 
pure information provided by the core infrastructure. 

There are numerous thick and thin BI client 
applications on the market today. What is consistent 
about BI consumption is the logical classifications of 
functionality they provide: 

e Reporting (tabular & aggregative) 

e Analytical applications that allow one to slice-and- 
dice cubes and metrics 

* Dashboards and scorecards used for performance 
management 

e Self-service applications that are used for on-the-fly 
reporting and analytical construction 

e Mobile intelligence applications that facilitate con- 
suming BI on the go 


Hopefully you now understand why and how compa- 
nies implement BI. [SQL] 
—Derek Comingore 

InstantDoc IDs 102915, 102928, and 102943 
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wu with strings is a common T-SQL coding 
task, whether you're trimming blanks off a 
string value for display or concatenating two strings 
together. SQL Server's built-in functions can help: 


LEN 


Use LEN to determine a source string’s length. It takes 
a single parameter containing a string expression. 


SELECT LENC'This is string') AS Length 


LEFT 


The LEFT function returns characters from a string's 
left side. It takes two parameters: the source string 
expression and the number of characters to return. 


SELECT LEFT ('SQL Server 2008', 3) As SQL 
SQL 


SQL 


RIGHT 

The RIGHT function returns a specified number of 
characters from a string’s right side. It also accepts a 
string expression and an integer. 


SELECT RIGHT ('SQL Server 2088', 
4) As Release 


Release 


SUBSTRING 

SUBSTRING returns a specified portion of a string. 
The first parameter is the source string, the second 
indicates the start position in the source string, and the 
third indicates the length to return. 


SELECT 'Otey' AS [Last Name], 
SUBSTRING('Michael', 1, 1) 
As Initial 


Last Name Initial 


_am 8 T-SQL String Functions 


REPLACE 

This function replaces all the instances of a specified 
source string within a target string. The first parameter 
is the source string expression, next is the search string, 
and last is the replacement string. 


SELECT REPLACE('SQL Server 2005', 
'2005' ,'2008') As [Replace Example] 


Replace Example 


SQL Server 2008 


STUFF 

STUFF inserts one string in another. The first param- 
eter is the source string expression. Next is the insertion 
point, then the number of characters to delete, and 
finally the string to be inserted. 


SELECT STUFF('SQL Services’, 
5, 8, 'Server') As [Stuff Example] 


Stuff Example 


SQL Server 


LTRIM 
LTRIM removes leading blanks from a string. It takes 
a single string parameter. 


DECLARE @myString varchar(40) 
en Get rid of five leading blanks' 
SELECT 'The new string: ' + LTRIM 


(@myString) As Example 


It returns “The new string’ minus five leading blanks. 


RTRIM 
RTRIM removes trailing blanks from a string. It takes 
a single string parameter. 


DECLARE @myString varchar(40) = 
"Get rid of five trailing blanks 

SELECT 'The new string: ' 

+ RTRIM (@myString) As Example 


It returns “The new string:’ minus five trailing blanks. 
SOL 
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